Kernel Planet

September 02, 2010

Valerie Aurora: We now resume our regularly scheduled hiking

About four years ago, I started feeling like crap. Things went downhill from there. I tried going to doctors, not going to doctors, medications, taking no medications, exercising to exhaustion, not exercising at all, eating vegan, eating mostly meat, not eating at all, etc., etc., etc., ad nauseum. I think the only thing I didn’t try was acupuncture, but it was next on my to-do list.

Through all this my friends were awesome to me, if worried. So I want to let my friends know that, as of a month ago, I’m back to health, physically and mentally – so much so that I’m going on an 18 mile hike in the Grand Canyon next week. One of the hardest part of the last few years was giving up any sort of serious hiking and it’s thrilling to be back on the trail.

If you’re a friend and curious about the details, you can email me. But if you are as tired of talking about my health as I am, you won’t. :) I do have a word of advice for my fellow workaholics: If you’re completely stressed out, and it just seems to get worse, go read up on adrenal exhaustion.

Thanks again to my friends for being so understanding and kind while I figured this out. I’m really looking forward to doing something other than lying on my couch in a daze all day long.


September 02, 2010 07:24 PM

Harald Welte: Motorola announces "Ming" phone with Android

For those who don't know: The Motorola Ming was the A1200, a commercially very successful Linux-based phone in China and other parts of Asia, using the EZX software platform, i.e. the kind of hardware that we once built the OpenEZX software.

Motorola has recently announced that they will follow-up with some android based ming phones. It is my suspicion that apart from some mechanical design aspects, those phones will not resemble the ming in any way, neither on the baseband hardware side, nor on the application processor side, and particularly not on the software side.

So it's probably nothing than a marketing coup, trying to connect to successes of the past. Not interesting from the OpenEZX point of view, I guess.

September 02, 2010 02:00 AM

September 01, 2010

Harald Welte: More GPL enforcement work again.. and a very surreal but important case

In recent days and weeks, I'm doing a bit more work on the gpl-violations.org project than during the last months and years. I wouldn't say that I'm happy about that, but well, somebody has to do it :/

Right now I'm facing what I'd consider the most outrageous case that I've been involved so far: A manufacturer of Linux-based embedded devices (no, I will not name the company) really has the guts to go in front of court and sue another company for modifying the firmware on those devices. More specifically, the only modifications to program code are on the GPL licensed parts of the software. None of the proprietary userspace programs are touched! None of the proprietary programs are ever distributed either.

If that manufacturer would succeed with such a lawsuit, it would create some very nasty precedent and jeopardize the freedom of users of Linux-based embedded devices. It would be a direct blow against projects that provide "homebrew" software for embedded devices, such as OpenWRT and many others.

I've seen many weird claims and legal strategies when it comes to companies trying to deprive developers of their freedom to modify and run modified versions of Free Software. But this is definitely so weird that I still feel like I'm in a bad dream. This can't be real. It feels to surreal.

It's a pity that I cannot speak up more about the specific company in question right now. I'm desperately looking forward to the point in time where I can speak up and speak out about what has been happening behind the scenes.

September 01, 2010 02:00 AM

August 31, 2010

Evgeniy Polyakov: Want to do it good - do it yourself

SDL_Init(SDL_INIT_VIDEO | SDL_INIT_AUDIO);

Python does not have an audio library, which allows to asynchronously play music and control it in close enough to real time. Mainly I need a fast audio start/stop capability, which will be triggered from python code when I press and release different keys on software piano keyboard.

So I will write it in C using SDL and SDL_mixer library in particular:

Mix_OpenAudio(audio_rate, audio_format, audio_channels, audio_buffers);
music = Mix_LoadMUS("music.ogg");
Mix_PlayMusic(music, 0);
Mix_HaltMusic();

The latter two functions can be used to control audio flow and it will be hardware mixed with other system sounds out of the SDL_mixer library box. SDL_mixer supports a wide variety of formats including OGG Vorbis and MP3.

And in a week or so I will publish a nice project I spent a week on. Actually it can be announced right here, but it does not yet work with production servers.
I created a simple enough photo uploader for Shotwell photo management application which uses Yandex.Fotki photohosting, which is the largest on in Russia and I used to use it quite frequently now. I hope my 1000 lines patch will be accepted (some people will even bring me beer for that :)

If I understood correctly Shotwell will replace F-Spot in modern distros.

Shotwell is written in Vala language, which is kind of C#, but its complier is sooo weird, that it is very unlikely I will return to it anytime soon. But java-style memory memory management (i.e. when you do not care about malloc/free) as well as compiler forced exception handling (one has to use try/catch blocks, when some function can throw exceptions) just make me relax during programming...

August 31, 2010 10:39 PM

Michael Kerrisk (manpages): Coming soon: The Linux Programming Interface

My book, The Linux Programming Interface, is now just a few days away from leaving the printer. (Most likely, people in the US who order online now will see the book before I even get a copy here in Germany.)

I've built out the content of the book web site with further information about the book, including:

August 31, 2010 06:00 AM

August 30, 2010

Dave Miller: How GRO works

All modern device drivers should be doing two things, first they should use NAPI for interrupt mitigation plus simpler mutual exclusion (all RX code paths run in software interrupt context just like TX), and use the GRO NAPI interfaces for feeding packets into the network stack.

Like just about anything else in the networking, GRO is all about increasing performance. The idea is that we can accumulate consequetive packets (based upon protocol specific sequence number checks etc.) into one huge packet. Then process the whole group as one packet object. (in Network Algorithmics this would be principle P2c, Shift computation in time, Share expenses, batch)

GRO help significantly on everyday systems, but it helps even more strongly on machines making use of virtualization since bridging streams of packets is very common and GRO batching decreases the number of switching operations.

Each NAPI instance maintains a list of GRO packets we are trying to accumulate to, called napi->gro_list. The GRO layer dispatches to the network layer protocol that the packet is for. Each network layer that supports GRO implements both a ptype->gro_receive and a ptype->gro_complete method.

->gro_receive attempts to match the incoming skb with ones that have already been queued onto the ->gro_list At this time, the IP and TCP headers are popped from the front of the packets (from GRO's perspective, that actual normal skb packet header pointers are left alone). Also, the GRO'ability state of all packets in the GRO list and the new incoming SKB are updated.

Once we've committed to receiving a GRO skb, we invoke the ->gro_complete method. It is at this point that we make the collection of individual packets look truly like one huge one. Checksums are updated, as are various private GSO state flags in the head 'skb' given to the network stack.

We do not try to accumulate GRO packets infinitely. At the end of a NAPI poll quantum, we force flush the GRO packet list.

For ipv4 TCP there are various criteria for GRO matching.

Certain events cause the current GRO bunch to get flushed out. For example:

The most important attribute of GRO is that it preserves the received packets in their entirety, such that if we don't actually receive the packets locally (for example we want to bridge or route them) they can be perfectly and accurately reconstituted to the transmit path. This is because none of the packet headers are modified (they are entirely preserved) and since GRO requires completely regular packet streams for merging, the packet boundary points are known precisely as well. The GRO merged packet can be completely unraveled and it will mimmick exactly the incoming packet sequence.

GRO mainly the work of Herbert Xu. Various driver authors and others helped him tune and optimize the implementation.

August 30, 2010 08:46 PM

August 29, 2010

Rik van Riel: More small 160m antennas

The last series of compact 160 meter antennas were not that satisfactory. Luckily, there is an easy, relatively compact and better performance 160 meter antenna design available, created by Rudy Severns N6LF ("Another Way to Look at Vertical Antennas), described in practical notes on John Tait's Antennae for the Low Bands page.

With some creative modeling, we can explore and possibly improve this antenna design.

read more

August 29, 2010 10:30 PM

August 26, 2010

Dave Miller: Converting sk_buff to list_head.

I've been trying to make this happen, off and on, for at least two years now. Most of the kernel is straightforward and uses the skb_*() interfaces we have for manipulating skb objects on a list.

So for those, simply tweaking the interfaces in skbuff.h will make them all "just work".

However there are a few other spots in the kernel which manipulate the SKB list pointers directly:

I'm taking another stab at this, and hopefully I can work out these wrinkles. It'd be a really nice change because of lot of uses of "struct sk_buff_head" which don't care about the spinlock or the packet count can be converted to simply "list_head" saving serious space in various datastructures.

August 26, 2010 11:40 PM

Kernel Podcast: Update on the LKML Podcast

Folks,

I’ve been super-crazy-nuts busy with my dayjob, but I do have podcasts up to Aug. 1 ready to upload from several weeks ago (literally didn’t have time to post them yet). I will do some catchup but I think the best thing is to just start with the latest stuff and move on, missing a couple of slow weeks from August. Sorry, it takes a lot of work. Sometimes this thing lags, etc. I don’t plan to stop doing it, I just have to prioritize work and sanity (what’s left) :)

Jon.

August 26, 2010 11:03 PM

August 25, 2010

Harald Welte: Convert RSS feed subscriptions from N810 feed reader to Android com.meecal.feedreader

I'm subscribed to a considerable number of RSS feeds, and so far I actually used to read them all on my Nokia N810, which is more or less permanently located at the bedside table

Now I wanted to import all the subscriptions into an Android RSS feed reader on the Galaxy S. Unfortunately the feed reader that I found most useable doesn't have OPML import. However, looking at its sqlite3 database for feed subscriptions, it was pretty easy to come up with a small perl script to generate "INSERT" statements for all the feeds from the N810 OPML file. In case anyone is interested, the script is available from here.

If you have any suggestions on a good Android RSS reader that can manage large number of subscriptions and put them into a tree/hierarchy of groups, feel free to let me know.

August 25, 2010 02:00 AM

August 22, 2010

Jaya Kumar: Electoral Fraud

Shame on the government for the arrest of security researcher, Hari Prasad, see USENIX letter. It looks very likely that the Election Commissioner, SY Quraishi's claims that the electronic voting devices are tamper-proof are totally bogus. I've always wondered whether a true majority of people were really voting in such obviously corrupt politicians like the Maino mafioso family, well, now we have to suspect, was it all likely to have been due to electoral fraud on a massive scale?

August 22, 2010 03:44 AM

Harald Welte: India jails activist doing research on weak voting machine security

According to several sources such as indianevm.com, Hari Prasad was being arrested. He is part of a team of IT security researchers that gathered evidence to demonstrate how incredibly weak the security of India's voting machines is. For more details, read the indianevm.com article linked above, and the various quotes/links in it.

This is very upsetting. They should jail those who have authorized the deployment of such an insecure system in the first place. Those are the people responsible - not some researchers who go out of their way to uncover the technical problems to warn the general public about the inherent risks of this technology.

I sincerely hope that the authorities will understand the grave mistake they're doing here. Don't shoot the messenger. It's not his fault that engineer, engineering management and/or regulatory government authorities have permitted such a system in the first place.

August 22, 2010 02:00 AM

Evgeniy Polyakov: Recent elliptics changes

In a meantime I added stream write support to HTTP elliptics network fronend. WIth this change it is possible to atomically append data to existing records (in backends which support it - currently it is file IO backened).

Thus HTTP interface becomes very useful for stream-like data updates in various projects starting from log saving to auido/video recording.

Another major change includes IO backend update: I finally dropped TokyoCabinet database in favour of libeblob. Main objection against TC was its extremely slow performance when index swaps to disk. I dropped VM cache during test and TokyoCabinet never recovered back even if amount of RAM allows to populate whole database (I used several millions of 10k objects and database size was close to 10 Gb while machine had 24 Gb of RAM).

Also TC version I tested (1.4.44 iirc) was very unstable - I even wrote a test script to restart upload, but to my shame I did not contact author with cores and backtraces. When I tried .23 version year ago it was rock stable, so I expect that in newer version things were fixed.

Another TC issue is related to multi-threading: this database just does not allow multiple users to read and write data simultaneously - it locks the whole database each time we start transaction, which is needed to protect against parallel writes. Although this was not an issue in elliptics network tests.

And now dig back into python and morphological data processing... I tested handful of phrase generation techniques, but neither provided good result for words which were not present in the dictionary. I have to find and answer.

August 22, 2010 12:28 AM

August 21, 2010

Harald Welte: Started to play with the Galaxy S (GT-I9000) phone

For many years I'm on a more or less consistent hunt for finding a reasonably open and free mobile phone. This started in 2004 with OpenEZX, has continued with Openmoko, project gnufiish and has resulted in a bit of peeking and poking in the Palm Pre. However, none of those projects ever had the success I was hoping for:

So I've constantly been on the watch for new devices that are coming out. Most of the phones you can buy in recent years are either running proprietary software like Windows Mobile, Symbian, Apples iPhone-OSX - or they run Android but then use some integrated Qualcomm Smartphone-on-a-chip product. The problem with the latter (from a Free Software point of view) is that Qualcomm is very secretive about their products, does not provide any kind of public documentation, and the ever-increasing integration between application processor and baseband processor makes it more difficult to run custom software on them.

The Samsung Galaxy S (GT-I9000) seemed like a good candidate to me, for several reasons:

So right now I'm in the exploration phase, making myself familiar with the bootloader, the flashing process, the userspace ABI of the custom (GPL licensed) kernel drivers, etc. It's a fairly pleasant experience so far, and I now have a debootstrap'ed Debian lenny on an additional ext2 partition on the SD card. This provides me with an actually useful userland I can chroot() into, such as lsof, strace, ltrace, tcpdump, etc. to do some more exploration of the phone.

The only real ugliness on the software side so far is the use of proprietary Samsung filesystems (RFS/TFS4). The only reason those filesystems existed, as far as I can tell, was to run legacy filesystems like FAT on top of raw NAND or OneNAND flash. This is mainly necessary if you want to export e.g. a FAT partition via USB Mass Storage to a Windows PC. However, the GT-I9000 doesn't have any OneNAND, but only an internal moviNAND (basically a SD-Card in a BGA package that you can solder on the board). MMC/SD cards already include the wear leveling algorithm, so there is absolutely no point (from what I can tell) in running the RFS/TFS4 stack.

In fact, in several forums people are complaining about the slow I/O performance of the Galaxy S, and they have a much better performance when using ext2/ext3 directly on that moviNAND device.

August 21, 2010 02:00 AM

August 19, 2010

David Woodhouse: 19 Aug 2010

I wanted to update a Nokia E71 to the latest firmware. So I booted a Windows 7 VM, went to the Nokia web site and downloaded the update tool. It took about quarter of an hour to download 33MiB over my crappy ADSL line.

When it finally finished downloading, I installed it and rebooted as it asked me to, then tried to start it using the icon it had installed on the desktop.

It told me there was an update available, and I couldn't use it until I updated. I muttered darkly at this idiocy, but let it update. It spent another quarter of an hour downloading, and only then did it check and tell me that it couldn't proceed because it needed to be run as Administrator.

So I right-clicked on it and used the 'run as Administrator' option, and watched it download itself for the third time. But still it failed, complaining that I had to run it as Administrator.

So I logged out completely and logged in using the Administrator account, and I ran it again. After downloading the entire thing for a fourth time it failed again, still complaining that it has insufficient privileges and needed to be run as the Administrator.

I am stunned — Nokia really ought to be ashamed at this crap.

August 19, 2010 12:32 AM

August 18, 2010

James Morris: Linux Security Summit 2010 – Wrapup

The first Linux Security Summit (LSS) was held last Monday, 9th August in Boston, in conjunction with LinuxCon 2010 North America.

This event has its roots in the Linux security development community which emerged in the early 2000s, following the development of LSM and with the incorporation of a wide range of new security features into Linux. We’d previously met, as a community, in OLS BoF sessions, various conference hallway tracks, and at project-specific events such as the SELinux Symposium. There have also been very successful security mini-summits at LCA in 2008 and 2009, and a double security track at the 2009 Plumbers Conference.

This year, we tried to broaden the scope of the event as far as possible — to situate it with a more general Linux conference (than Plumbers, for example), and bring in not only developers, but the wider end-user community as well. We had great attendance from the security developer community, with pretty much all major areas of development represented, although not as many end-users as we’d hoped for. We were, however, easily able to fill up a days worth of bleeding edge technical discussions, with around 70 developers in attendance throughout.

Presentations were limited to thirty minutes, including discussion, to help ensure an interesting and stimulating event, aimed at fostering ongoing discussion and engagement. In this sense, it seems we were generally successful, with several strong discussions arising during presentations. There were many follow-up meetings between developers, end users and vendors during the remainder of LinuxCon, which was very gratifying to see.

Z. Cliffe Schreuders sparking a lively debate about security usability
Z. Cliffe Schreuders sparking a lively debate about security usability

Mobile security was one of the core issues discussed at LSS (and during the rest of the week), with the year of the Linux desktop now apparently permanently canceled due to smartphones and similar devices. There are certainly many very difficult and exciting challenges to be met in this area over the coming years, and it was great to be able to have the MeeGo security folk present on their work.

Another important area (as always), is security usability, with new high-level policy language work presented by Josh Brindle (lolpolicy). Z. Cliffe Schreuders presented the results of a comparative usability vs. efficacy study from his FBAC-LSM project, sparking some very robust and productive discussion. (Certainly from an SELinux point of view, we are trying to learn as much as possible from this kind of research, which is otherwise very thin on the ground).

Stephen Hemminger presented on the topic of integrating security into a router (Vyatta). This kind of presentation is really very useful to have when there are so many security developers present — it helps us better understand the nature & scope of security requirements for a wider range of real-world users.

Brad Spengler’s presentation addressed the difficult area of protecting the kernel itself, arising from his experiences developing grsecurity. As most of our protection mechanisms operate within the kernel, attacks on the kernel can render these mechanisms useless, so it is important to try and harden the kernel as much as possible. Brad outlined some areas which we still need to address upstream (or in distros, at least), a topic which was further developed by Kees Cook in his talk on Out of Tree security features.

IMHO, we face a number of challenges in this area: 1) core kernel developers are not always receptive to enhanced security, 2) the solutions proposed often are technically not acceptable to upstream (and require a lot of persistent reworking) and 3) we don’t have a huge pool of available expertise upstream in these areas. Kees has taken on some of the challenges here, and any additional contributors here would certainly be welcome, although I would not anticipate any smooth sailing.

We also had project updates from Mimi Zohar on EVM, Karl MacMillan on security management, Dan Walsh on SELinux Sandbox, and Stephen Gallagher on SSSD.

The panel discussion kicked off with a session on the viability of a standard Linux security API. It was good to get a discussion going here, with well-considered input from key developers. It seems the consensus is that our various security models are too fundamentally different to develop the kinds of APIs you might see in proprietary OSes, although the issues are certainly recognized (e.g. hindered ISV and end user adoption of security) and people are thinking about solutions. There are many difficult, open issues in this area, although we really don’t have the option of not solving them — as a society we’re ever increasingly reliant on computing, and thus also on its security.

Casey Schaufler leading the security API panel discussion
Casey Schaufler leading the security API panel discussion

 

There’s already been quite a lot of feedback from attendees on the format and co-location of future events. There was some talk of aiming at a more purely technical conference (e.g. Plumbers), although it seems to me that there was a great benefit in being able to assemble a critical mass of security developers alongside the other LinuxCon developer mini-summits, as well as general end users, vendors etc. A couple of people also mentioned the Collab summit, although I wonder if being invite-only may limit the overall scope of participation. We may also look at a two-day event next year, to allow for keynotes, a few selected longer talks for major new projects, and break-out sessions.

If anyone has feedback or ideas, please join the LSS mailing list and post your thoughts.

Slides from the presentation are now linked from the schedule (where available), and I’ve posted a brief photo set on flickr. If you post any photos or blogs from the event, please tag them with #lss2010, and drop me an email, so I can link to them from the web site.

Overall, it seems that we had a very productive and collaborative event, bringing together key people to discuss ongoing and emerging challenges in Linux security. Indications thus far are that we should expect to see useful developments arise out of discussions begun at this summit, in some of the areas mentioned above.

The Linux Foundation organizers seamlessly provided us with everything we could need in terms of a venue and support — allowing us to concentrate on the program itself. Many folk worked behind the scenes, but I’d like to especially thank Angela Brown, C. Craig Ross and Amanda McPherson.

Also thanks to everyone who presented and attended, and to the program committee, who worked quickly to review and evaluate all the proposals.

August 18, 2010 01:06 AM

August 17, 2010

Harald Welte: Doing RFID related research and development again

More or less a bit surprising to me, I got again involved in RFID research, on which I hadn't really done much ever since my involvement in the OpenPCD and OpenPICC projects some five-to-four years ago.

It's a lot of fun, and I didn't seem to forget much. What really bothers me a bit is that the OpenPCD / librfid / OpenPCD integration never really was completed, and that libnfc doesn't work with OpenPCD. Let's hope I'll somehow find some time to change this. It just feels wrong that OpenPCD was the first hardware project created to encourage (security) research into RFID, and now all the current tools only run on the Proxmark or on proprietary readers...

August 17, 2010 02:00 AM

August 16, 2010

Evgeniy Polyakov: Python loving psto

I love languages with rich standard library. Python is just awesome in this regard.

But amount of already written extension is outstanding - I parsed HTML using regexps in Lisp, but in Python with python-lxml it took just couple of hours to parse rather broken html using xpath and small string matching calls.

I spent one day to write a parser of non-structured morphological data (frequently with suddnly unwanted symbols or additional tags within) from aot.ru to create a quite large (300k+ morphems) russian dictionary, and then to store it into prefix array and ouput as XML file.

Yes, default CPython sucks with threads, it is not (yet) suitable for trivial audio processing (play and stop sound when pressing/releasing a key), but it is just bloody ubergood at high-level prototyping.

Returning back to morphological analysis I'm about to start rewriting my experimental knowledge extraction and grammatic generation 'engine' from Lisp to Python. And I expect to have some cool results with it soon.

August 16, 2010 02:45 PM

August 14, 2010

Pavel Machek: Tandem jump

So I enjoyed tandem jump from 4 kilometers... and I really enjoyed it. It is just slightly on the expensive side.

Oh, and I'm now proud owner of a horse... or rather a big pony. Fjord... apparently of "red dun" type. If you want to see a rather free and yellow horse, come to Prague.

August 14, 2010 06:52 PM

Harald Welte: Worlds first 20 minute voice call from a Free Software GSM stack on a phone

As Dieter Spaar has pointed out in a mailing list post on the OsmocomBB developer list, he has managed to get a first alpha version of TCH (Traffic Channel) code released, supporting the FR and EFR GSM codecs.

What this means in human readable language: He can actually make voice calls from a mobile phone that runs the Free Software OsmocomBB GSM stack on its baseband processor. This is a major milestone in the history of our project.

While Dieter has been working on the Layer1 TCH support and the setup of the voiceband path in the analog baseband chip (audio ADC/DAC), Andreas Eversberg has been quietly working on getting call control of Layer3 into a state where it can do all the signalling required for mobile-originated and mobile-terminated call.

Combining both of their work together, they have been able to make a 20 minute long voice call from a baseband processor running a Free Software GSM stack. For all we know, it is the first time anything remotely like this has been done using community-developed Free Software. Five years ago I would have thought it's impossible to pull this off with a small team of volunteers. I'm very happy to see that I was wrong, and we actually could do it. With less than half a dozen of developers, in less than nine months of unpaid, spare-time work.

Sure, the next weeks and months will be spent on bringing the code from alpha level to something more stable, fixing known issues and known bugs, etc. But I'm confident the biggest part of the work on the OsmocomBB stack is behind us. Big thanks to the developer team driving this project forward.

August 14, 2010 02:00 AM

August 13, 2010

Rusty Russell: fcntl lock starvation and TDB

The Trivial DataBase (ccan variant here) uses fcntl locks for consistency: records are chained off a fixed-size hash table (or the free list), and a 1-byte fcntl lock at the offset of the chain head protects all records in that chain.

There’s also a tdb_lockall() function which grabs a lock across all the hash chains at once to let you do arbitrary atomic munging.  This works because fcntl() locks have an offset and length: you can lock arbitrary byte ranges.

Unfortunately, tdb_lockall() is subject to starvation, at least under Linux.  This is because the kernel merely checks whether a lock is available and gets it if it can, rather than queuing behind someone else who wants a superset of the lock.  So single byte lockers come in and out while the whole-file locker waits for them all to go away.

I’m not sure this is wrong, and as it’s simpler than the alternative, I’m not prepared to change it just yet.  Fortunately, there’s a way of avoiding this starvation in userspace, which occurred independently to both Volker Lendecke and me.  I called this variant tdb_lockall_gradual(), in which we try to lock each chain one at a time so we compete with the single-chain lockers on fair terms.  My first naive thought was to try to lock all the chains one at a time in order, nonblocking, then go back and retry (blocking) any we failed to get.  This is slow, and can deadlock against another process doing the same thing.  Volker’s suggestion was much nicer: we do a non-blocking lock, and if that fails we divide and conquer.  If we get down to a single record, we do a blocking lock.

I wrote  a test program which fires off N children, each of which grabs a random chain lock for 50-150 milliseconds before sleeping for 1 second, then repeating. The parent waits for a second, then tries to do a tdb_lockall() or tdb_lockall_gradual() depending on the commandline.  Running it five times and showing the average wait time for the lockall gives this:

Now, regarding performance.  While there can be 10000 hash chains, this isn’t as bad as it sounds.  The fast case is the uncontended one, and that’s as fast as before, and the contended case is already slow.  I annotated the source to print out how many blocking/nonblocking locks it’s actually doing.  Inevitably, if there’s contention, it will end up dividing down to a blocking lock, so log(numchains) locks before doing a blocking lock.

Processes Blocking locks Nonblocking locks Seconds
5 0-2 1-27 0.03
50 8-12 93-111 0.20
500 13-21 130-170 0.29
5000 309-347 1660-1832 9.1

Sure, that’s a lot of locking when we’re competing with 5000 processes, but it’s less the naive one per chain, and it’s clear that it’s not the cause of the slowdown (we’re doing fewer locks per second than the 5 processes case).

And anyway, locking the entire database cannot be a speed-critical operation.  Indeed, after the evidence here, I followed Volker’s suggestion to simply replace tdb_lockall() with the new implementation.

August 13, 2010 04:09 AM

Harald Welte: Wondermedia WM8505 Linux + u-boot source code

In recent months, a number of alleged GPL-violation reports regarding products (tablet computers, mini netbooks and the like) using the Wondermedia WM850x line of ARM SoCs. People have been contacting me, as I was working as VIA Open Source Liaison, and there is the general belief that VIA and Wondermedia Technology (WMT) are one company.

I had investigated this issue even before there were any reports, and I'd like to publicly state that:

Notwithstanding all of the above, Wondermedia was willing to provide the Linux kernel and u-boot source code of their SDK to me, so I can share it with the community. As indicated, they're not legally required to do this and I'm happy they do it anyway to show their good intentions.

You can download the released source code from the gpl-devices.org ftp-server, more specifically here are the latest Linux kernel (modified 2.6.29 android derivative) and u-boot source code archives.

This software is provided without any kind of support. If you see some GPL related legal problems (i.e. you believe it is incomplete), don't hesitate to contact me. To the best of my knowledge WMT (basically a small hardware start-up with small software development team) has no resources to actively push any of this mainline.

August 13, 2010 02:00 AM

August 12, 2010

Evgeniy Polyakov: Python threads: scheduling

Python threads suck! They do not have deterministic behaviour from scheduling point of view, i.e. one does not know when thread will run and how long will it run.

And the main problem is that it likely may not give up control when faces syscall and/or lock. So it is not possible to control it using lock acquired in main thread and spawned one. PyQt signal will not be handled with enough priority to interrupt spawned thread which drops into thread.lock.

Eventually spawned thread will give up control, but this non-determinism is not acceptible for piano project I work on - we have to start and stop sound when key is pressed (modulo some short enough timeout, but not failing into extreme like using low-latency jack audio servers).

August 12, 2010 01:03 PM

Rusty Russell: Bob Jenkins and lookup8.c

I pulled Bob Jenkins’ superlative lookup3.c into ccan some time ago, and with the work on TDB2 (aka 64-bit TDB) I wondered if there was a 64-bit variant available.  I didn’t look hard enough: I should have checked his top-level page and seen lookup8.c before wasting his time with an email :(

I did note in my mail that since lookup3.c can return a pair of 32 bit numbers, I could combine those to get a 64 bit hash “but I wondered if it was possible to do better…”

His one-line reply was that he was working on a 128-bit hash now.  Respectful and informative.

The gold standard test of online manners is how you respond to a dumb random email.  A truly gracious gentleman/lady will respond as if you are far smarter far than you appear.  Thanks Bob.  And 128-bit; I’m a little stunned…

August 12, 2010 07:15 AM

August 11, 2010

Evgeniy Polyakov: elliptics@eblob: 5000 rps of random IO requests in 1 Tb of data (100 millions of objects)

Legend.

elliptics network - distributed hash table storage
eblob - low-level data storage used as one of IO backends for elliptics network

Hardware: 2 E5530 servers (16 cores, 24 Gb of RAM) each one is connected to SAS shelf (14 disks, ext4, raid10).
Data: 100 millions of objects (total of 1042 Gb) roughly equally spread over above server nodes.
Requests: random IO reads.

Result? We have it:


Reply time (left, in ms) @ number of requests per second (right, red inclined line)



Reply time (in ms) distribution

Net result: 3500 rps within 100 ms, 4000 rps witin 200 ms.
Not that bad I think...

August 11, 2010 04:16 PM

Pete Zaitcev: Google is not infallible

The failure of Google Wave is something to remember because it demonstrates very dramatically that Google is not infallible.

History of Google is actually littered with failures (see Google Notebook), but what I think makes Google Wave different is how it was not supposed to fail. Look for example at Orkut. Orkut.com was a 20% project of Mr. Orkut, booted on a shoestring, a blind shot into a dynamic field. It's unclear how much of the blame for it could be placed at the feet of Mr. Orkut and how much of it was just Brazilians runing it. But Wave is different. It was not a product of a random mediocre hacker like you and me, that grew larger than he could handle, too quickly. Wave was conceived by the omnipotent "Google Ph.Ds" in the very heart of Google Labs.

This is something to keep in mind when considering how much time to devote to studying and reimplementing Google Storage.

August 11, 2010 02:59 AM

August 10, 2010

Evgeniy Polyakov: New elliptics network release: 2.9.1

This day has come: I made whole new and shiny elliptics network release. Amount of invasive changes is noticeable but yet a lot more I found should be changed.

We made 30 intermediate releases after 2.9.0 which revealed fair number of bugs and showed a number of different ways of further elliptics storage development.

A short changelog includes:

With this release I mark 2.9 tree as experimental, since I will break compatibility soon to allow faster fault recovery process.

Milestone TODO list includes:

Stay tuned!

And now its time to recall about OLOLO-intellect and related lexical and morphological analysis...

August 10, 2010 08:29 PM

Pete Zaitcev: tabled and OpenStack/Swift

When OpenStack and its storage component, Swift, were opened a few weeks ago, I thought that tabled could be safely abandoned. Indeed tabled and Swift basically do the same thing, their APIs are similar, but Swift is a battle-tested code, whereas tabled is something Garzik and I hacked together for testing purposes. However, Jeff Darcy pointed out that although the API differences are minute, they are not absent and as long as Amazon EC2 implies keeping your data in S3, applications must support S3. And as long as that continues, there's a point in having a no-cost, independent implementation of S3. So tabled lives on.

Still, I would expect that if Jeff were right, someone would've sent us a few patches by now. But as the changelog shows, that is not happening (I mean Bo and Colin sent some, but those were not off-shoots of a field deployment of tabled, they just hacked on Hail in general).

P.S. I forgot to mention that RHEV cloud uses S3 (and so, tabled) in the same way as Eucalyptus uses Walrus. I even thought about possible cross-polinaton. If Eucalyptus people were to use tabled, it would buy them the redundancy- and replication-based data protection that tabled implements and Walrus does not. The downside is, it's not like data protection is important for VM images, and saddling Eucalyptus with a dependency on Hail seems unlikely to win many fans. Anyhow, the tabled continues for Red Hat needs, that goes without saying. I am just considering Fedora and the general hack angle here.

August 10, 2010 05:58 PM

Evgeniy Polyakov: Reactor vs. thread-per-client models

I used to believe that reactor (aka state-machine) model is always superior compared to thread-per-client case, since threads are huge, context switch is CPU-pricey and slow.

At least it was 10 years ago when sky was bluer and grass was greener. Now things changed.
When I developed kevent - a kernel dispatching system for file descriptors, process' and other events, I implemented it as a complex enough state machine, which handled different event types and performed scheduled for each event type (like network AIO and so on).

Those days state-machine system did not differ too much from one created on top of thread-per-client model, especially in IO cases where clients frequently sleep waiting for IO completion.
Now, when multi-core systems become commodity hardware we want to utilize all CPUs, which means creating more and more separate state machines each one running on own physical processor. Or we can create a pool of threads where each thread will process its own state machine.

In elliptics network libevent-based state machine handles all network operations. Another state machine handles client commands and third one glues them together (this one is especially visible when we forward data from one socket to another).

System of complex structures is as weak as its weakest member. Thus, when we block in directory reading syscall which takes 10 minutes to populate cold dentry and/or inode cache from directory containing millions of records, that thread is completely uselesss for other operations scheduled. Which means no network transfers or other commands which could be done waiting for blocking operation to complete. In some systems I worked with on top of elliptics it was as much upto 10 thousands of 'clients' per default 128 IO threads.

And in IO-heavy systems amount of such blocking operations prevails number of potentiall non-blocking (like most of writes and all socket events, since it signals only when something can be read/written and sockets are non-blocking). As a proof-of-concept one may consider recent Java NIO vs sync-IO benchmark (shorter and more popular HTML article).

Thus I decided to rewrite event-handling model in elliptics network and use simple thread-per-client approach. It will eliminate possible libevent bugs (and I found some of them in 1.3 versions) as well as greatly simplify processing logic. Also it will be possible to imlpement IO priorities, first by using ionice and then maybe by introducing automatic tools.

August 10, 2010 11:58 AM

Arjan van de Ven: On the changing role of PowerTOP

I’m realizing that PowerTOP got released 3 years ago now. While not nearly as old as the Linux kernel, it’s time to look back and then forward again.

When PowerTOP originally was released, we had 3 categories of users in mind:
1) Developers of Open Source applications
2) Users of Laptops and other power sensitive devices
3) Distribution developers
I assumed PowerTOP would have a small, niche userbase and mostly wrote it for the few of us working on making Linux power usage better.

Pretty soon it was clear that end users jumped on the tool, and wanted more and more out of it. As a result, the “check the system for tweakables” section grew pretty quickly. Open Source developers also jumped on the tool and used it to fix the polling behaviour in their applications. Over time, PowerTOP grew more tips, but most of all, more diagnostics. The 1.13 version that Auke Kok recently released added support for reporting statistics on how effective the power saving in the SATA or audio driver is, how effective the new device runtime power management is but also which applications keep your disk out of power save mode. These are half-a-Watt kind of power savings each.

Now, three years later, I think that the user base of PowerTOP has changed. The various Linux distributions are doing, on average, a quite OK job on power saving. The result is that the value of PowerTOP to end users has been reduced significantly; many of the things PowerTOP checks for are just set correctly out of the box now. At the same time, the diagnostic use of PowerTOP is increasing. More and more, the answer to the “my system is using too much power, can you help me” is the “can you send me PowerTOP output”. With my distribution hat on (for MeeGo), I can say that I use PowerTOP a *lot* to find out how well we’re doing on power management, and often want even more diagnostics.

With the userbase moving from end users to experienced engineers, it’s time for the internals of PowerTOP to get redone. The original PowerTOP code base is still in use today, with many many things bolted on. The code has gotten pretty hard to read, and even harder to expand and add features to. Much code is duplicated between various pieces of PowerTOP that mostly do the same thing, with some small difference.

So it’s now time to rethink some of the code code and make things much more scalable for adding new checks and features. In addition, the output also needs to improve to be more useful as a diagnostics tool. I’m thinking about adding a “generate a report” option, that basically gives a complete report card of the system.
This doesn’t mean I want to leave the end user behind; not at all. But in terms of new features, with all the low hanging fruit taken care of, some of the things PowerTOP needs to do are just a lot more technical than what PowerTOP 1.0 offered.

Over the weekend I’ve been prototyping a few things, and so far, I like where it’s going. There is a lot of things to add still (I’ve been mostly focusing on reporting CPU behaviour so far), but I’m hoping I have enough time in the next few weeks to get something that is usable for others to give feedback on.

August 10, 2010 02:02 AM

August 09, 2010

Evgeniy Polyakov: elliptics@eblob on 14-disks SAS raids

We got 2 servers (E5530, 16 cores, 24 Gb of RAM) with 14-disks SAS shelf converted into raid10 in each.
Uploaded 30 millions of records (record size varies from 5 to 20 kb) total of 370 Gb of data.

Fired with random requests and got this graph:

2000 rps within 130-150 ms, according to dynamics we could get more, but stopped test.
Will run longer with additional 60 millions of records (total of about 100 millions objects in the storage).

Eblob index must take all available RAM on such machine, with 15 millions of records on each (30 millions total) we get about 9.9 Gb of virtual memory and 8.7 Gb of 'real' mem.

Should fit 100 millions of records on 2 nodes perfectly :)

In a meantime I cooked up 2.9.1 release, its release candidate will pass final tests and I will make an announcement in a day or so. We got 30 internal releases prior this one already, its time to publish new shiny features in a new package.

Stay tuned!

August 09, 2010 03:22 PM

Paul E. Mc Kenney: "The Trouble with Multicore" by David Patterson in July 2010 IEEE Spectrum

Patterson's article is quite interesting, and brings out some good points. I was of course happy that he mentioned Sequent, my old employer, despite the fact that the mention was on a list of “long-gone parallel hopefuls.” An unflattering mention, perhaps, but undeniably true.

However, I was especially happy to see the following sentence:

So rather than working on general programming languages or computer designs, we are instead trying to create a few important applications that can take advantage of many-core microprocessors.

Focusing on parallelization in the large is a great improvement over the traditional academic focus on parallelization in the small. All else being equal, the larger the software artifact, the larger the units of work, and the smaller the fraction of computational resources spent on communication. The less the communication, the better the performance, and usually the greater the scalability. So Patterson's pronouncement is a welcome change, especially given his group's earlier focus on small-scale computational kernels. I hope that the fact that Patterson has now joined the growing group of academics focused on parallelization in the large will encourage other academics to do the same.

Of course, I could raise a number of quibbles with the paper:

  1. The analogy of parallel processing with journalism (last full paragraph of the last column on page 30) misses the mark. Patterson notwithstanding, the fact is that most writers do in fact use parallel processing: there will be a reporter, a copy-editor, and so on. It is in fact quite common for authors of large works to acknowledge those who did research, fact-checking, and other tasks. Of course, to Patterson's point, there must be a limit to the degree of parallelism that can be achieved. But the success of things like Wikipedia indicates that the potential for parallelism is much larger than has been commonly thought.
  2. Patterson argues that desktop applications rarely have sufficient intellectual horsepower behind them to make good use of multicore systems (last sentence of page 31). History has shown, however, that it is not raw intellectual horsepower that is required, but rather experience and proper training.
  3. Patterson also seems to believe that parallel programmers should start small and work their way up to larger systems (last sentence of first paragraph of page 32). Sequent's experience indicates otherwise: by starting off with 30-CPU systems from the get-go, Sequent avoided the typical parallel-programming experience, which is to rewrite the program from scratch multiple times, first to accommodate parallelism at all, next to scale beyond two CPUs, next to get beyond the 16-32-CPU level, and so on. Diving into the deep end of the parallel-programming pool can be quite a bit cheaper and easier than gingerly paddling out from the shallows.
  4. Patterson complains that large systems (128 cores) are not being manufactured, and that software emulation is painfully slow (middle of third column on page 32). Such large systems have in fact been available for quite some time from a number of manufacturers. Of course, they are still quite expensive, which can certainly render them unavailable to most developers. However, there is little need for universities to fabricate them, unless of course they are conducting research on the hardware itself.

Finally, the box on page 31 entitled “Easy as Pi” deserves special attention. In this box, Patterson contrasts a sequential method for calculating the quantity π/4, namely summing the infinite series for the arctangent of one radian, with a parallel Monte Carlo method, which generates pairs of random floating-point numbers between -1 and +1, then counts the fraction that lie within the unit circle.

How good are these algorithms?

August 09, 2010 12:06 AM

August 08, 2010

Evgeniy Polyakov: Python threads ...

Using 'threading' module apparently is not a very good idea

$ python /tmp/threads.py
# ... interrupt by Ctrl-C

Traceback (most recent call last):
  File "/tmp/threads.py", line 12, in 
    TestThread().start()
  File "/usr/lib/python2.5/threading.py", line 442, in start
    _sleep(0.000001)    # 1 usec, to let the thread run (Solaris hack)

$  cat /tmp/threads.py 
import threading, sys

test_num = 0
class TestThread(threading.Thread):
	def run (self):
		global test_num
		test_num = test_num + 1

num = 10000;
for x in xrange (num):
	TestThread().start()

print "var: ", test_num, "num: ", num

I created this trivial Python::threading example which spans 10k threads which update global variable to find out whether GIL will protect it and to check how fast is thread creation. It happend that python threads are damn slow to create (and topmost dump confirms that - python sleeps there!). During this test python interpreter ate about 1% of 16 available CPUs and there was only single thread at a time, which means GIL protects thread creation and execution. When I added time.sleep() there, GIL was dropped to enter syscall and procfs confirmed that there were many simultaneous threads created.

But low-level module 'thread' is different. As documentation states:

This module provides low-level primitives for working with multiple threads (also called light-weight processes or tasks) — multiple threads of control sharing their global data space. For synchronization, simple locks (also called mutexes or binary semaphores) are provided.

I do not know what above high-level stuff does, but low-level threads are created very quickly forcing python interpreter to ate gigabytes of virtual memory and then throw unknown exceptions at the end (practice shows that waiting for all threads to complete does not rise those exceptions, maybe this happens only in 2.5.2):

Unhandled exception in thread started by 
Error in sys.excepthook:

$ cat /tmp/low_level_threads.py 
import thread

test_num = 0
def test_thread(smth):
	global test_num
	test_num = test_num + 1

num = 10000
for x in xrange(num):
	thread.start_new_thread(test_thread, ('qwe',))

print "var: ", test_num, "num: ", num

This application ate about 60-100% of CPU so I wonder whether GIL allows multiple threads to run (and not wait in syscall) on different physical processors without hacks (i.e. explicit GIL state drop and acquire).
There are also nice lock classes in that module.

I plan to use separate thread for each key in my piano keyboard emulator, which will be scheduled from key-pressed signal.

August 08, 2010 03:59 PM

Harald Welte: Working on a document on smartphone hardware architecture

I've started to write upe some information on modern smartphone hardware architecture. It will be in a similar style to what I previously wrote on feature phones and gsm modem hardware, but with a specific focus on smpartphones, their multiple processors, memory sharing, AP/BP interface, audio architecture, etc.

I should have done this a long time ago. In fact, I think I should write more documents like that on various technical subjects. If you want to learn about low-level aspects of modern telephones, there is way too little published information out there.

August 08, 2010 02:00 AM

August 07, 2010

Matt Domsch: Dell at LinuxCon Boston



For the second year in a row, Dell engineers will be on hand at the Linux Foundation’s LinuxCon conference in Boston next week.  While I don’t get to fly a helicopter in the Penguin Bowl this year, we’ll have plenty of face time with the engineers and enthusiasts on hand.

On Wednesday at 10:30am, I’ll be presenting on Network Device Naming, which simplifies this:

PowerEdge R610 with 8 Ethernet ports

by letting the system administrator use better names for their network ports than “eth0”.   Can you guess which is eth0 in that picture?  (Hint: it might be green, it might be red, it might be orange and it may change from time to time.)

Shyam Iyer  follows me at 11:30am, presenting “Storage Provisioning with iSCSI for Virtualized Environments”, which describes the work he has been doing with the Open-iSCSI and libvirt teams to simplify iSCSI storage use by virtual machines, to take advantage of all the great hardware acceleration our EqualLogic arrays provide.

On Thursday at 2pm, I return to the stage in a panel moderated by Matt Asay, COO of Canonical, titled “What’s Next for Linux”, alongside James Bottomley of Novell, David Recordon of Facebook, and Ravi Simhambhatla of Virgin America.   I’m especially interested to be on this panel, as my cohorts are pushing the limits of computing, often with Dell’s help, and simultaneously Dell is active in the new worlds they’re creating.

See you in Boston next week!

August 07, 2010 04:17 AM

Harald Welte: On my way to Taiwan for COSCUP

Tomorrow early morning I'll be on my way to Tapei/Taiwan. The main reason for this trip is the invitation to speak at COSCUP 2010.

I'm really looking forward to getting back Taipei, which has become something like my second home during the years I was working on Openmoko. I've really gotten used to life in this super-urban Asian metropolis... to the extent that I'm almost a bit homesick while I'm actually at home in Berlin/Germany.

August 07, 2010 02:00 AM

August 06, 2010

Evgeniy Polyakov: Python piano

Spent several hours to write a trivial piano emulator using Python. I do not have a book so just googled whatever did not work: apparently all newbie problems are already solved.
I used PyQt bindings and Qt-designer to create UI. pyuic4 generates a bit ugly Python code: first, python 2.6 does not understand it (namely assignment some value to function, like self.some_func() = something, so I had to manually edit it (I found a post that latest Python snapshots already support it, likely it was about Python3), second, it does not know about loops and just creates a static representation of the form screen.
Also I did not work hard on things like resizing (I just forbid it :)

To play sounds I used PyMedia library, unfortunately it is not available in Ubuntu, and its compilation failed, so I had to edit setup scripts and put proper defines for C code :) Furtunately I quite know C a little.

Probably I will switch out from PyMedia, since it does not allow to control output sound buffer size. I need to 'schedule' audio sample playing so that qt-signal handler returned and next one (like key release) could be processed. PyMedia implements async sound output, but it may be not enough - for example it blocks trying to get and play 100.000 samples.
Or I will try to work with Python threads and assign press/release signals to different threads (if it is possible, I do not know Python enough yet).

This is a very ugly proof of concept code to date. I will polish things and split downloaded scales into separate tones, which will be played when appropriate keys are pressed. I will also try to implement nicer classes for keys and sounds.

And in a meantime my workplace screenshot (full-screen on click):

I found it so simple to program using Python that it is very likely I will switch all prototyping development to Python from Lisp. Lisp is great and I do like its features, but it is way too rare when I used its cool abilities like writing complex macros, most of my macros were rather simple with-something () like with-open-file() and friends.
Lambda and closures for internal states are great, but iirc Python also has them.

But main feature of the Python I like is its incredible standard library as well as zillions of already written extensions. I was not able to find graphical bindings for CLISP or audio processing for example, although its quite unusual to write GUI in Lisp.
I had to write HTTP GET/POST utility myself using low-level sockets; regular-expression in CLISP is rather limited, and although cl-ppcre is great, rich regexp is available in Python out of the box. Python has some multithreading support (and ugly GIL to name), while CLISP does not (at least 2.44.1 version, which comes with Ubuntu Lucid).
And so, and so and on...

Stay tuned, I will play more with this project, after all I want to play some simple music at work :)

August 06, 2010 11:41 PM

Evgeniy Polyakov: Shooting at elliptics@libeblob with lots of data

We got 2 servers (4 SATA disk raid10, default ext4, 24 Gb of RAM, E5530, 16 cores), uploaded 30 millions of objects into elliptics network on top of libeblob backend. Total of 370 Gb on disks.

And got this with completely random requests:

900 rps within 100-150 ms. Quite a superb result for such hardware.
And we have this SAS shelfs:

md4 : active raid10 sdr[13] sdq[12] sdp[11] sdo[10] sdn[9] sdm[8] sdl[7]
                               sdk[6] sdj[5] sdi[4] sdh[3] sdg[2] sdf[1] sde[0]
      2050780256 blocks super 1.2 16K chunks 2 near-copies [14/14] [UUUUUUUUUUUUUU]

which will be used to host about 200 millions of objects (upload will be finished next week - writing server can not cope with such load :). Will also test the same amount of data on 4-sata-disk raid10 machines as well.

Stay tuned!

August 06, 2010 10:03 PM

Matthew Garrett: AE_AML_BUFFER_LIMIT in \_SB_._OSC

If you get messages like this:

ACPI Error (psparse-0537): Method parse/execution failed [\_SB_._OSC] (Node ffff8801e8c62b30), AE_AML_BUFFER_LIMIT

then you've fallen foul of one of the less appealing aspects of ACPI. _OSC methods are defined as methods to allow the operating system and the firmware to handshake over their support of optional features. Different _OSC methods apply to different types of hardware. CPU _OSC methods allow the operating system to inform the firmware that it supports ACPI 3.0 throttling states, while PCIe _OSC methods allow the operating system to indicate that it can manage native PCIe hotplug. The firmware can choose whether or not to give up control of these features, and the OS then has to cope.

The problem arises when we get to the _OSC method on the system bus. This wasn't specified until ACPI 4.0, leaving an attractive mechanism for vendors to add OS/firmware integration. To that end, we now have at least three different _SB_._OSC methods in the wild:

  1. The ACPI specified _OSC. This exists purely for the OS to tell the firmware what it supports, without the firmware having the opportunity to disagree. As such it passes 8 bytes of data to the method.
  2. The Microsoft WHEA _OSC. This exists to allow the firmware and Windows to handshake over whether or not the firmware supports Microsoft's hardware error reporting.
  3. HP's PCC _OSC method, designed to allow handshaking between the OS and the firmware in order to determine whether they support OS interaction with the firmware-level CPU scaling


This wouldn't be enough to be a problem in itself. The ACPI spec requires _OSC methods to have GUIDs in order to protect users from exactly this kind of situation - Windows can attempt to enable WHEA on a machine with a spec-compliant _OSC, and the _OSC method will return an immediate failure because the GUID doesn't match. Except that the WHEA and PCC versions of _SB_._OSC pass 12 bytes of data in the third argument, against the ACPI spec version's 8. And many _OSC implementations attempt to access the region between bytes 9 and 12 before checking the GUID, resulting in the AE_AML_BUFFER_LIMIT error.

This is made even more annoying due to the fact that argument 2 contains the number of parameters being passed in argument 3, making it straightforward to avoid this kind of failure. Firmware authors, tbh.

August 06, 2010 08:44 PM

Valerie Aurora: I want an Elena Kagan t-shirt

It should be no surprise that I am thrilled – nay, bubbling over with effervescent happiness – that Elena Kagan was confirmed today to the U.S. Supreme Court. As of her swearing in on Saturday, the U.S. Supreme Court will, for the first time, consist of 33% women (well, technically, 33.3%).

As a rule, I wear very few shirts with words or logos on them, but I gleefully make an exception for Elena Kagan and/or the womaniest Supreme Court in history. What’s your suggestion for a t-shirt, hoody, bag, or other declaration of support for Elena Kagan and our shiny new Supreme Court? “Elena Kagan Rules” in stark white caps on a black babydoll is a good start. And can I get one in time for the Linux Storage and File Systems workshop on Sunday?


August 06, 2010 05:51 AM

August 05, 2010

Matt Domsch: Interview with Jared Smith, new Fedora Project Leader



I thought this was a well-done interview by Henry Kingman of Linux.com, welcoming new Fedora Project Leader Jared Smith.

I’ve been fortunate to serve on the Fedora Project Board since 2006, and to have the opportunity to work with several FPLs (Max and Paul directly, and their predecessors Michael, Christian, and Greg in various capacities), and I look forward to working with Jared even more now.  He brings a wealth of experience, talent, and enthusiasm that’s contagious.

I’m also quite pleased with the way the transitions between FPLs have been handled.  Both Max and Paul knew for themselves when they were ready for new challenges – not that they were “burned out” (e.g. CATB lesson #5), or that they were no longer being effective, but realized that they could apply their talents towards Fedora in new ways, while opening new opportunities for another talented and respected contributor.  That’s a big part of building a healthy community.

August 05, 2010 09:45 PM

Ulrich Drepper: Cancellation and C++ Exceptions

Cancellation and C++ Exceptions

In NPTL thread cancellation is implemented using exceptions. This does not in general conflict with the mixed use of cancellation and exceptions in C++ programs. This works just fine. Some people, though, write code which doesn't behave as they expect. This is a short example:

#include <cstdlib>
#include <iostream>
#include <pthread.h>

static pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t c = PTHREAD_COND_INITIALIZER;

static void *tf (void *)
{
  try {
    ::pthread_mutex_lock(&m);
    ::pthread_cond_wait (&c, &m);
  } catch (...) {
    // do something
  }
}

int main ()
{
  pthread_t th;
  ::pthread_create (&th, NULL, tf, NULL);
  // do some work; simulate using sleep
  std::cout << "Wait a bit" << std::endl;
  sleep (1);
  // cancel the child thread
  ::pthread_cancel (th);
  // wait for it
  ::pthread_join (th, NULL);
}

The problem is in function tf. This function contains a catch-all clause which does not rethrow the exception. This is possible to expect but should really never happen in any code. The rules C++ experts developed state that catch-all cases must rethrow. If not then strange things can happen since one doesn't always know exactly what exceptions are thrown. The code above is just one example. Running it will produce a segfault:

$ ./test
Wait a bit
FATAL: exception not rethrown
Aborted (core dumped)

The exception used for cancellation is special, it cannot be ignored. This is why the program aborts.

Simply adding the rethrow will cure the problem:

@@ -13,6 +13,7 @@
     ::pthread_cond_wait (&c, &m);
   } catch (...) {
     // do something
+    throw;
   }
 }
 

But this code might not have the expected semantics. Therefore the more general solution is to change the code as such:

@@ -1,6 +1,7 @@
 #include <cstdlib>
 #include <iostream>
 #include <pthread.h>
+#include <cxxabi.h>
 
 static pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
 static pthread_cond_t c = PTHREAD_COND_INITIALIZER;
@@ -11,6 +12,8 @@
   try {
     ::pthread_mutex_lock(&m);
     ::pthread_cond_wait (&c, &m);
+  } catch (abi::__forced_unwind&) {
+    throw;
   } catch (...) {
     // do something
   }

The header cxxabi.h comes with gcc since, I think, gcc 4.3. It defines a special tag which corresponds to the exception used in cancellation. This exception is not catchable, as already said, which is why it is called __forced::unwind.

That's all. That is needed. This code can easily be added to existing code, maybe even with a single hidden use:

#define CATCHALL catch (abi::__forced_unwind&) { throw; } catch (...)

This macro can be defined predicated on the gcc version and the platform.

I still think it is better to always rethrow the execption, though.

August 05, 2010 01:38 AM

August 04, 2010

Evgeniy Polyakov: I am a musician

A fucking bad musician actually, but I learn...

Yes, there are harminies and scales, accords and arpeggios and so on, which I study every day as well as practicing my trumpet itself. And I have quite good progress I think.

But very frequently I want to play some simple (or complex) motive of the melody I hear. Or 'workaround' it a little... At home I can play piano and try to pick out melody by ears, but in office or anywhere near computer I can not do this.

The reason is simple: Linux does not have a piano synthesizer. Well, it has plenty of them, but each one plays through MIDI. And audio system in Linux is only good for listening. MIDI/alsa/oss/jackd/pulseaudio - thousands of them, and this web only works in some precise setups.

I tried to install plenty of MIDI software on Ubuntu Lucid Lyx - epic fail. Jackd stops, pulseaudio crashes and freezes... It is such a mess.

So I decided to write a trivial application which will look like a piano keyboard and play recorded samples instead of going through MIDI synth when keyboard or mouse keys are pressed. I checked FreshMeat - there are only MIDI synthesizer there.

And I need a simple, fast, nice and non-brain-fucking in setup application, which just plays what I press.

August 04, 2010 10:02 PM

Evgeniy Polyakov: Presentation bit: Elliptics network implementaion details

5. Elliptics network implementaion details: data redundancy, fault tolerance, transctions, versions and snapshots, data deduplication and so on.

Elliptics network is a distributed key/value storage. Having more keys associated with the same data being written means that multiple copies of the written block will exist in the storage. Using data hashes to generate keys and having multiple hash functions registered for given block we end up having multiple copies of the data.

Knowing ID distribution among nodes in the storage it is possible to tweak hash generation function the way it will produce hash, which will belong to the interested node. This is a rather complex task to be implemented on client, and instead we introduced virtual datacenters.

Virtual datacenter is a set of nodes combined into some logical group, where nodes may or may not actually be physically groupped together. System modifies hash generated by common function by changing its first 4 bytes to be equal to the datacenter number.

Thus VDC feature allows to specify preconfigured prefix in every transaction ID - the first 4 bytes. Nodes which have their first ID bytes equal to VDC number will receive all transactions indexed by appropriate hash function.

Here is a virtual datacenter example.

Let's suppose we have two transformation functions setup on client: dc1_sha1 and dc2_sha1. They will produce sha1 hash of the data with the first bytes set to 1 and 2 accordingly.

Now let's suppose we add two nodes with IDs being equal to 0x01000000 and 0x01000080 and then two nodes with 0x02000000 and 0x02800000 IDs. Now there will be two transactions made with above transformation functions with its first 4 bytes set either to 1 or 2, so there will be a guaranteed copy in each virtual datacenter.

Now we can implement multiple VDCs with different ID's first bytes for every datacenter we want to work with. If we only need to have 2 copies, but there are more than 2 datacenters, nodes can be spread between those VDCs. '2' is arbitrary here of course.

This feature also allows to implement geographical linking of the requests. Let's suppose some application receives data read request from Moscow, it can check whether its set of transformation functions contains the one with the ID assigned to Moscow. If there is such a function, it can be used first to obtain object ID and to fetch it from Moscow-local servers instead of going to New York, where main datacenter lives. If Moscow storage does not contain our requested object, we will use second transformation function(s), which will 'point' to main storage cluster.

5.1. Fault tolerance in elliptics network.

In a meantime I was told that there will be a special man who will teach me how to make presentations...
They got me. And I did not yet read what I wrote several days ago :)

August 04, 2010 09:50 PM

Pavel Machek: TI Chronos status

My firmware is at sourceforge (project mychronos), newest versions are in git. It compiles/runs on both emulator and hardware. Emulator is an extremely tricky hack, but does the job and is usable for UI development. It needs root permissions. (Is there even emulator for Windows?)

If you have newer version of watch/AP with working RFBSL, it is easy to do development fully on Linux:

You'll need:

* Chronos CC (to flash txt files to watch)

* mspgcc - I used debootstrap to generate unstable chroot, then unpacked binaries from http://losinggeneration.homelinux.org/wp-upload/msp430/ . I had to use apt-get to install one more library, but that was easy.

(Improvements to this "HOWTO" welcome :-).

August 04, 2010 08:51 PM

Eric Sandeen: Cycling towards Socialism!

bcycleAw, nice bike!  Looks innocuous, right?  NO!  That red bike is the friendly looking twin of the black helicopter!

Republican gubernatorial candidate Dan Maes is warning voters that Denver Mayor John Hickenlooper’s policies, particularly his efforts to boost bike riding, are “converting Denver into a United Nations community.”

“This is all very well-disguised, but it will be exposed,” Maes told about 50 supporters who showed up at a campaign rally last week in Centennial.

From this Denver Post article, we learn that downtown bike-sharing in Denver is in fact an insidious plot by the UN to take away our freedoms.

Sadly, it’s happening near me too.

Thank god for the clear-headed Tea Party folks candidates like this, who will protect us from such evils!

“At first, I thought, ‘Gosh, public transportation, what’s wrong with that, and what’s wrong with people parking their cars and riding their bikes? And what’s wrong with incentives for green cars?’ But if you do your homework and research, you realize ICLEI is part of a greater strategy to rein in American cities under a United Nations treaty,” Maes said.

(And remember, the Red Dawn invasion started in Colorado.  Don’t let it happen again!  And did I mention that the bike is…. RED?  Cue Glen Beck blackboard scribble)

August 04, 2010 05:11 PM

David Woodhouse: 4 Aug 2010

A while ago, I reluctantly took over maintaining the get_iplayer tool: http://www.infradead.org/get_iplayer/.

When you search for it by name, my page isn't very high up on the list of results. Hopefully a few syndicated links to it from various planets that carry this will help...

August 04, 2010 12:46 PM

Harald Welte: Playing more with Erlang

Last year I started to occasionally play with Erlang. People who know me as die-hard C coder who tries to avoid C++, Java and Python wherever possible will probably be surprised here now.

I have no intention of changing my general position on programming languages. I don't feel comfortable using something where I don't know and/or understand the immediate impact on how this code will be executed on the actual silicon.

However, if you have a need to play with anything that uses ASN.1, but particularly the aligned/unaligned PER encoding variants, then it is pretty clear that there is nothing available as Free Software that can compare to the Erlang asn1ct/asn1rt modules.

At that time last year I was doing some rapid prototyping with the RANAP protocol, and the progress was quite quick. I never had time to return to that project, so it (and my Erlang skills) were left dormant.

In recent weeks, I have picked Erlang up again - again to work on ASN.1 encoded messages: This time TCAP and MAP. While we still need the in-progress TCAP+MAP implementation in C for OsmoSGSN, there are other tasks at hand where an Erlang-based implementation might yield a much higher productivity.

So right now I'm working on a program that parses/decodes and iterates through every MAP component in a TCAP message and replaces certain fields, re-encodes the entire message and sends it off the wire. Once that is done, I think I'll actually try to do a more complete TCAP server and implement a simplistic HLR for OsmoSGSN testing.

August 04, 2010 02:00 AM

Harald Welte: Official wiki page on GSMTAP created

I've come up with GSMTAP about two years ago while working on airprobe. The goal was to have something similar to what radiotap does in the wifi world: A pseudo-header that adds additional information and context that is not present in the actual message.

Initially, GSMTAP was intended to be a separate link-layer type in the pcap file format, but this would preclude its use in real-time protocol analysis. So I modified it to be encapsulated in UDP packets, which are sent and received using normal UDP/IP sockets.

Over recent years, GSMTAP has not only been integrated into multiple programs of the airprobe project, but is also understood by wireshark. OpenBTS has also decided to adopt the format and can generate GSMTAP messages for debugging purposes.

After creating OsmocomBB, it was taught how to generate GSMTAP messages very quickly, too.

So by now, at least when it comes to Free Software, it is definitely the de-facto standard for capturing/transmitting and analyzing protocol messages from the GSM air interface.

However, until now, there has never been any official "homepage" of the GSMTAP header. This has changed now, the GSMTAP homepage is now part of the OsmocomBB wiki.

August 04, 2010 02:00 AM

August 03, 2010

Linus Torvalds: "13744 supplied"

Yeah, I'm not talking about the number of burgers McD supplies each second or anything like that. No, I'm talking about our fully automatic coffee maker. I just did a cleaning cycle, and ended up looking up what the coffee count was.


I may have an addiction problem.

Now, admittedly this is a coffee maker that we've had for something like 8 years, but that's still 4.7 doubleshots of coffee supplied every day for those eight years (you can ask for a single or a double, and the counter just counts "events").

Ok, I lie. The coffee maker counts the different types of coffee it makes separately, and "only" about two thirds of the events are actually double-shots. But I make that up by still supporting Starbucks (and Peet's) enough to make up the difference. And Tove accounts for about half, so when I say that I have an addiction problem, I should probably have said "we".

So every time I see some piece of medical research saying that caffeine is good for you, I high-five myself. Because I'm going to live forever.

August 03, 2010 02:28 PM

Pete Zaitcev: Wow, Lennart

Someone observed on the Fedora devel list that "there was not fixed a single bug last six months (since 2010-02-23)" in PulseAudio, and asked if it was dead. Lennart replied:

Also, check upstream git to figure out whether a project is dead. The last commit there is from 2 weeks ago. Git can tell you something about whether development happens. Bugzilla just tells you whether I have devoted my life to processing bug reports. And well, I haven't done that. Sorry that I don't exclsuively spend my time on making my stats on bugzilla look pretty. If I did, then I would not get any real work done anymore.

Now I worry about the future of systemd and the hazards of relying on abandonware for key system functions.

P.S. Pulse works very well for me in F13. Which is great, because it looks like I'd be out in the cold if it did not.

August 03, 2010 03:23 AM

August 02, 2010

Jesse Barnes: using kdb on KMS

It’s not quite upstream yet, but there’s enough code out there for it to be useful. And by describing it maybe a few contributors will step forward to help with the remaining pieces. :)

First off, you’ll need Jason’s KGDB tree with a few fixes from me on top. I’ve collected them all at git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/kgdb-2.6.git in the kdb-kms branch.

Go ahead and build that kernel, and make sure you have CONFIG_KGDB_KDB=y and CONFIG_KDB_KEYBOARD=y set in your kernel config. After installing the kernel, add “kgdb=kms,kdb” to your boot command line. This enables the KMS enter/exit hooks and allows KDB to be driven from the locally attached keyboard.

Once you’ve rebooted, you should be able to enter KDB using SysRq-G or “echo g > /proc/sysrq-trigger", or by hitting a bug or breakpoint. Resuming from where you left off is as simple as typing ‘go’ from the kdb prompt.

The above should work ok for simple cases today, but there are several outstanding issues:

  1. console unblank support - currently when you enter KDB it will try to unblank the console. Since this path takes locks in the console, fb and drm layers, it can cause problems. Fixing this shouldn’t be too hard though; the kdb enter hook should take care of actually unblanking the console (e.g. if it had been DPMS’d off before), so all KGDB needs to do is make sure console I/O is enabled, which is a smaller console and fb bookkeeping activity. So the console hook needs to be split and the fb enter hook needs to handle preparing fbcon for I/O.
  2. cursor save/restore - right now, when you enter KDB you’ll still see the cursor if it was enabled when you entered. Saving and restoring cursor state should be handled by the fb hooks, but the DRM hooks currently don’t bother.
  3. driver support - I’ve only tested on i915, but adding radeon and other KMS support should be fairly straightforward. Likewise, adding support for plain fb drivers should be pretty easy as well, they just need a small function to write the scanout base register, ideally to the previously allocated fbcon memory location.
  4. enhance the DRM KMS layer to allow the reservation of a dedicated debug crtc, encoder and connector tuple - this would allow keeping the kernel console active on e.g. the VGA port, making debugging of desktop applications and the graphics stack easier.

That’s it for now, hopefully we can get at least some of this merged for 2.6.36 (the fb and DRM changes in particular are very small).

August 02, 2010 07:38 PM

Evgeniy Polyakov: Elliptics network article for Linux Kongress 2010

The purpose of Elliptics network storage is to allow users to access a set of physically distributed servers through flat addressing model in decentralized network environment. Key/value distributed storage provides an efficient method of accessing data with limited set of constraints. As a proof that such functionality is useful in a real life scenarios we present practical implementation of the DHT storage server with modular IO backends on top of common filesystems or database and various frontends varying from POSIX interface to HTTP access mode. We will discuss limitations faced with distributed hash table approach and compare them to functionality provided by centralized storage systems, namely high-performance data access and its high-availability in the fault-prone environment. Based on the practical results and flexibility of the implemented storage model we will highlight possible new functionality and the ways it could be made in the discussed system.

This is an abstract, I will post some blocks here, but whole article will be available when LK is over.

August 02, 2010 11:03 AM

August 01, 2010

Harald Welte: On the recent news items about the homebrew IMSI-catcher for 1500 USD

Some news sites seem to do very limited research and present it as big news that you can now build an IMSI-Catcher for a budget of USD 1500, using OpenBTS and a URSP.

Let me bring some clarity into this situation:

Also, the theoretic basics ow how to operate an IMSI catcher are nothing new either. There are even a number of patents covering IMSI catchers, the first that I know of has been patented by Rohde & Schwarz in 2003. Also, see this blog post by OpenBTS founder David Burgess on this topic.

So all that you always needed is a bit of hardware and software to send radio waves containing messages formatted in the way how they are described in the (equally public) GSM specifications as published by ETSI and 3GPP. Commercial, proprietary systems have existed for a decade. From 2008 on, there is some Free / Open Source Software to operate GSM networks. The situation remains unchanged in 2010.

So please, remember this the next time somebody is trying to tell you that this is the latest invention since sliced bread.

August 01, 2010 02:00 AM

July 31, 2010

Harald Welte: Dieter Spaar has started a blog

Dieter Spaar, who has been involved in various ways with both OpenBSC and OsmocomBB has just started a blog. This is good news and I hope this way he will get a bit more (much deserved) exposure on his great work.

July 31, 2010 02:00 AM

Harald Welte: GSM Denial of Service by flooding BTS with RACH requests

At Blackhat US 2010, there was a Talk that (among other things) apparently included the subject of a RACH DoS on GSM base stations, implemented using my Layer1 of the OsmocomBB software.

As some news sites are covering this as "news": This vulnerability has been long known in the field and was - to the best of my knowledge - first demonstrated to a public audience by Dieter Spaar at the Deepsec 2009 conference in November 2009. You can get his slides.

The difficult part for many years has not been to know about the possibility of this weakness. Anyone who has read the GSM air interface specification will inevitably see that there is a limited number of RACH slots and a limited number of dedicated channels. Once you fill more RACH slots than the cell has dedicated channels, and you keep re-filling them at a higher rate than the cell can expire those dedicated channels, you have a DoS.

So rather, the difficult part was to implement it in practise, as traditionally all GSM baseband chipsets have been extremely closed, just like the very software (firmware) running on them. Today, starting from Q2/2010, it is very easy to do a proof-of-concept implementation, as we have created OsmocomBB: An Open Source baseband firmware.

Dieter Spaar's implementation predates OsmocomBB development by the better part of a year. At that time, he had to resort to binary-patching existing proprietary (binary-only) baseband firmware. So I think people should recognize his effort in doing the first practical implementation of that attack.

July 31, 2010 02:00 AM

July 30, 2010

Pavel Machek: Electrical collar

Repairing electrical dog collar is a lot of fun, especially if you succeed... but it kind of trains you not to succeed. It also raises some ethical issues with testing.

Well, maybe... one of my coworkers asked if it is my new headlamp when he seen that. So perhaps human testing is a possibility.

July 30, 2010 10:11 PM

David Woodhouse: 30 Jul 2010

Got bored of having to run 'make install' when hacking on Evolution, partly because libtool insanity makes it take too long — as for some reason it relinks everything as it installs it. Perhaps that was needed for FORTRAN77 programs on OSF/1, but it isn't needed on my modern Linux system. I hate libtool. But even without that, re-running 'make install' every time you change a line of code is a pain.

For a while I took to manually symlinking the libraries and executables I was working on, from my build directory into their installed locations. But I kept missing some out and that was a pain too.

My current solution, which excited mbarnes sufficiently that I felt I ought to share it more widely, is to re-run autogen.sh with the --enable-fast-install argument, then build it and run 'make INSTALL=install_symlink.sh install'. Then all files get installed as symlinks instead of being copied, and all I have to do is hack code, type 'make', and run evolution again.

The script is a dirty hack and there are much better ways to do it — some of which would even cope with filenames that have spaces in. But it works for me, and makes Evolution hacking a little easier.

July 30, 2010 03:03 PM

Harald Welte: A real-world practical A5/1 attack using airprobe and Kraken

At Blackhat USA 2010, Karsten Nohl has been presenting on a practical real-world A5/1 cracking attack. For recent years, Karsten, myself and others have been speaking at various opportunities, indicating that a practical attack using readily-available information and tools from the Internet is very possible, and that it is only a matter of time for somebody actually does it.

While Karsten has focused on the actual cryptographic attack, I've been putting in some time in projects like airprobe (a GSM receiver/decoder).

Now finally, a team of friends at the new Security Research Labs (founded by Karsten) in Berlin has put the pieces of the puzzle together.

Airprobe has been extended to fully support decoding of TCH/F (FACCH, SACCH and traffic), as well as SDCCH/SACCH control channels, and to specify the timeslot and physical channel configuration from the command line. Using this, you can

The external program to recover the A5/1 ciphering key is called Kraken and is also available from the SRLabs website.

So what are the limitations? Well, so far this only works on non-hopping cells with a single ARFCN. The limitations are those of the receiver hardware (and SDR software), and not really limitations of the airprobe GSM decoder or the actual software tools.

In the past I would have assumed that non-hopping and/or single-ARFCN cells are rare, but in fact we can find them even inside a big city like Berlin, from at least two of the four German GSM operators. So that's why this attack is very practical, no matter what the GSMA might say.

July 30, 2010 02:00 AM

Dave Jones: SELinux on low memory systems.

My router is a pretty underpowered machine. It has 512MB of RAM, and its ‘disk’ is a 2GB flash card on a CF to ATA adaptor (read as: really slow). But given its job is just routing packets 99% of the time, neither of these deficiencies are an issue.

Asides from one problem. Every time I did a yum update that pulled in an selinux policy update, it would consistently exhaust all the ram in the machine. I filed a bug on this, and as usual, Dan Walsh dropped some selinux knowledge that I had no idea about.

You can customize the bzip block size and “small” flag via
/etc/selinux/semanage.conf. After applying you can add entries like these to
your /etc/selinux/semanage.conf to trade off memory vs disk space (block size)
and to trade off memory vs runtime (small):

bzip-blocksize=4
bzip-small=true

You can also disable bzip compression altogether for your module store
via:
bzip-blocksize=0

Since I put that first tweak in place, it’s survived several policy updates without a hiccup.

SELinux on low memory systems. is a post from: codemonkey.org.uk

No related posts.

July 30, 2010 01:15 AM