Kernel Planet

May 11, 2008

Jaya Kumar: Bangalorites, did you vote?

So I read that barely half of you did. "Bangalore 52.14 percent". Small wonder that the infrastructure and traffic is so "terrific" if only one out of every two people cares enough to go and drop a ballot. Fix things please. :-)

May 11, 2008 06:00 PM

May 09, 2008

Dave Jones: On reading books.

I'm almost ashamed to admit that last year, I think I only read (as in completely, cover to cover) 3 or 4 books. This year doesn't seem to be going much better either. I'm still slowly making my way through my first of the year. The problem isn't that I'm a slow reader. My problem is one of choice. I buy a book, start reading it, and when I'm halfway through, for whatever reason, I'll end up with another new book, and start reading that one, and the cycle repeats. I'm great at starting books, awful at finishing them. I think I must have started reading about 20-30 books last year alone. I seem to only really get through books when I'm traveling. The fact that I don't have a bookcase full of options with me confines me to one or two choices. This is the reason that devices like the kindle, (or any other e-book reader for that matter) are a really bad idea for me. When overwhelmed with choice, my indecision consumes me. (Seriously, you should see me fail at choosing a lunch option when in a new city surrounded by restaurants. Even the infamous google cafeteria was total overload for me. I think if I worked there, my indecision would actually lead to me starving to death before I made a lunch choice).

Last November, whilst in Portland,OR I picked up a copy of The soul of a new machine. It's a book I've been meaning to read for years, after countless recommendations from friends. Same old story though, I'm dragging my heels getting through it. It's not that it isn't an interesting book. It's really interesting to me how many parallels there are in the story to things that are happening/have happened at Red Hat.

I think I'm about halfway through it. 7 Months. Perhaps I need to travel more.

May 09, 2008 07:21 PM

Matthew Garrett:

My previous entry was somewhat misleading in one respect - the discussion of the power consumption of a downclocked processor. The problem is that nowadays, halving your CPU frequency doesn't halve the power consumption (see the figures in Arjan's slides from OSCON last year, for instance). I'm assuming that this is due to the cache size on modern hardware being sufficiently large that it dominates the power consumption of the processor. Dropping the frequency doesn't reduce the amount of power required to keep the contents of the cache alive, so the saving is less than you'd expect. Deeper C states disable the cache and save much more power.

So, if halving your speed means everything takes twice as long but doesn't even halve your power consumption, what's the point in having P states at all? There's a certain amount of latency and power involved in moving between C states, and if the choice is between rapidly cycling between full speed and C4 or just sticking at low speed and maybe dropping into C1 or C2, then executing code at the lower performance state may be beneficial. The ondemand governor takes this into account by looking at the amount of load on the processor over time, so if this doesn't hit a threshold value it'll assume that you're better off staying at the lower performance level.

May 09, 2008 03:25 PM

May 08, 2008

Matthew Garrett:

Modern CPUs are great. They have all sorts of advanced power saving features, which is one of those nice cases where everyone can save money, gain performance and claim environmental credentials at the same time. Everyone's a winner.

Well. Everyone's a winner as long as your software doesn't suck.

I've talked about the benefits of the tickless kernels and reducing wakeups and spending longer in deep C states before, so if you don't know about them then go and read that first. This time I'm going to focus on a different level of hardware, and a different level of suck.

For a long time, laptops supported changing the speed of processors when switching between AC and battery. CPU power consumption is proportional to frequency, so dropping the frequency meant a longer battery life. Of course, it also meant that it took longer to get anything done - the reason this was still a win was because CPUs in those days consumed just as much power when idle as when running. Transmeta introduced a technology called Longrun with their Crusoe processors, bringing the ability to drop both the frequency and the voltage of the CPU simultaneously. With power consumption being proportional to the square of the voltage, even a small drop resulted in worthwhile power savings. As the only really worthwhile thing Transmeta brought to the x86 world[1], this was unsurprisingly ripped off by everyone else. Intel introduced their Enhanced Speedstep, AMD gave people PowerNow and VIA have Longhaul.

Obviously, reducing the frequency of the CPU increased battery life. Everyone's happy?

No[2].

The problem is that nowadays, processors don't consume as much energy when they're idle as when they're running. The aforementioned C states mean that an idle processor consumes a tiny percentage of a loaded one - an ultra-low voltage Intel part will draw on the order of a watt. Executing code, even at the lowest voltage and frequency, will draw far more power. Obviously, we want to keep the processor idle for as long as possible. The easiest way to do this would be to never run anything, but that's not a real option. The alternative is to run when we have to, but make sure that we get it over with as quickly as possible so we can return to the idle state. Counterintuitively, that means switching to the highest voltage and frequency, executing the code and then dropping back into the idle state. By going faster, we save power[3].

In summary, the only sensible way to use a CPU is to run it as fast as possible in order to let it idle as much as possible, and drop the frequency and voltage when it's not doing anything. The. Only. Sensible. Way.

Some people write software that lets you choose different power profiles depending on whether you're on AC or battery. Typically, one of the choices lets you reduce the speed of your processor when you're on battery. This is bad. It is wrong. The people who implement these programs are dangerous. Do not listen to them. Do not endorse their product and/or newsletter. Do not allow your eldest child to engage in conjugal acts with them. Doing this will reduce your battery life. It will heat up your home. It will kill baby seals. The sea will rise and your car will float away. If you are already running it, make sure that it always sets your cpufreq governor to ondemand and does not limit the frequencies in use. Failure to do so will result in me setting you on fire[4].

The only legitimate reasons for limiting the speed of your CPU are to avoid overheating (which should be fixed in the kernel, really - having userspace in charge of ensuring the continued functioning of the machine is madness) or to make the machine quieter. And if you want your machine to be quieter, there should be a tickbox marked "Reduce performance in order to reduce noise" or something, which would take into account all the sources of heat in your machine rather than just your CPU. Encouraging the managing of acoustic levels by asking users to restrict the functionality of their CPU is just another way of saying "Look! We suck!". Letting the user choose a specific CPU governor or a specific frequency is not a useful thing to do. Don't do it unless you want to see dead kittens. Delivered by UPS.

[1] And, presumably, whatever else Intel and everyone else ended up licensing off them which resulted in their reinvention as an IP company rather than a CPU one, but that's just not interesting to me.

[2] Even ignoring the people that are unhappy for entirely unrelated reasons, such as injured toenails or the brutal murder of their family

[3] There's a corner case here, which is a system that is always entirely CPU bound. Say we halve the CPU's speed. Along with the voltage drop, that gets us down to about 20% of the original power consumption. Of course, it now takes twice as long to do anything and your screen, RAM, hard drive, chipset and so on are still drawing power, so will end up costing you twice as much power as they would have done if you'd run at full speed. If you do the maths, it works out that you save power if your processor's full-speed power consumption is more than 1.7 times that of the rest of the platform. In the real world, things are made more complicated by the rest of your platform consuming more power if you're working over a longer period of time - your hard drive is going to end up spending more time spun up, your memory bus is going to be active for longer and so on. You're basically not going to hit this case.

[4] While the burning of your body will result in carbon emissions, the reduction in power usage should offset this in the long run

May 08, 2008 11:19 PM

Dave Jones: OSX fail.

For an OS that apparently "just works", I seem to encounter a remarkable amount of fail.
Some investigation reveals all manner of disasters in syslog..

kernel[0]: IOAudioStream[0x4634100]::clipIfNecessary() - adjusting clipped position to (d0b9d,3fd9)
kernel[0]: IOAudioStream[0x4634100]::clipIfNecessary() - Error: attempting to clip to a position more than one buffer ahead of last clip position (d0b2f,3ff9)->(d0b9f,4b).

The best part about this fail is that it does it even when no audio is playing. I just need an app that plays audio to be open.
Result: A gazillion of these in the logs.

usbmuxd[104]: MuxTCPSendRST StartWrite failed: 0xffffffff
usbmuxd[104]: MuxInterfacev1Receive Dropping packet. Received 26036, expected -769822944 bytes

lol whut?

kernel[0]: AppleYukon2: 00000000,00000001 sk98osx sky2 - - sk98osx_sky2::replaceOrCopyPacket tried N times
Good job. Please try n+1 times.

This is just from todays log, and just the kernel messages. The spew from userspace freaking out about this that and everything is even worse.

But at least it looks pretty whilst it's failing.

May 08, 2008 01:21 PM

May 07, 2008

Dave Jones: comcast fail.

Comcast's DVR is the only device I've ever owned that seems to consistently get worse with every firmware update. From the removal of useful shortcuts, to putting invasive advertising in the program guide reducing screen space for useful stuff (like program info). The last two updates have really got under my skin though.

After playing back a recorded program, it used to pop up a dialog asking if you wanted to delete it. This no longer happens. Instead you stare at a frozen image. When you press 'stop' the dialog appears (it also appears after 15 seconds of staring at the frozen screen). According to forum posts I found somewhere, this 'feature' was added because some customers (read as 'morons') felt "panicked" and "rushed" into making a decision. Comcasts official line is that this feature was added so that people could rewind from the end and watch the last scene of a program again. Sounds great. The best part is, this was always possible. Even when the dialog was onscreen.

The latest "feature" that drives me up the wall: The DVR box has an LCD on the front displaying the time, or the current station being watched. When in standby mode, it shows the time. This is really handy, it means I don't need another clock in the room. Or at least that was the case until the last firmware update. Now, after a period of inactivity (ie, whilst in standby for a while), the LCD powers off. I'd like to think there's some benefit to this, like saving power. However, due to comcasts ineptitude, the DVR box actually uses the same amount of power in standby mode as it does when powered up. Winning. Net result: I no longer get to put my DVR in standby mode. (Not that it ever did anything anyway).

Every time this gets under my skin, I get renewed enthusiasm to get mythtv set up. Then I remember what a pain in the ass that was the last time.

May 07, 2008 11:25 PM

Pavel Machek: Kohjinsha wifi driver

Kohjinsha wifi driver is now updated to 2.6.26-rc1, and should be available in my work git-tree. I won't have Kohjinsha with me for next week, so don't expect any development. But if you want to clean up some really challenging code, feel free to help. Longer term, some testers would be very welcome.

May 07, 2008 08:52 PM

Evgeniy Polyakov: Fast transactions in POHMELFS.

POHMELFS just switched to faster transactions allocated one-by-one with even smaller overhead (although it does not use kernel_sendpage() for page sending yet, it copies data).
System does not serialize after all transactions are completed (it waits after each one), but with new transaction allocation it is 1.5 times faster: 98MB/s vs. 64MB/s, note that without waiting for transaction completion it gets full wire speed of 125MB/s with 1500 byte MTU. And it is with highmem pages and thus slow kmap() of each one, and unmap after completion. I do not use ->sendpage() since it will force to split proper set of iovecs into mixed calls of kernel_sendmsg() and kernel_sendpage(), which I want to avoid so far. Now it is (again) faster than NFS, but I want to move further.
So, solution is rather trivial: wait until several transactions are completed. There is the whole infrastructure already there - in-flight transaction storage, per-transaction completion and destruction callbacks, proper reference counting and async completion.
Still only writing transactions are used (i.e. reading/lookup and others will not redirected to different servers).
There are some bugs of course, but that's the first development version after all. Comments (0)

May 07, 2008 06:00 PM

May 06, 2008

Matthew Garrett:

One of the advantages of working in a biology lab is getting deliveries of stuff in dry ice and getting to play with the dry ice afterwards. Sadly, while dry ice is clearly very cool (a-ha ha ha ha ha (dies)) it makes a lousy way of cooling down your drinks[1]. With a latent heat of sublimation of merely 199kJ/kg, CO2 draws less energy out of the liquid than ice's latent head of melting of 334kJ/kg. That's easily dealt with by using larger blocks of dry ice, but the fundamental problem seems to be that most of the sublimed CO2 boils straight out of the glass and just gently cools the atmosphere instead. Maybe ethanol cubes are the way forward.

[1] Much like Red Stripe, it is also a lousy fabric softner.

May 06, 2008 10:48 PM

Val Henson: Sometimes it's the hardware

A few years ago, I bought this cute little USB hub from Fry's. It was one of those faux-Apple style pieces of hardware, with that milky-glass look, ultra-small and stylish. I used it to connect my keyboard, mouse, and printer to the laptop, but the printer only worked when it was connected directly instead of through the hub. I cursed Linux. Then I started working on the Macbook Air with the same setup. and when the printer didn't work with Mac OS X, I cursed printer firmware writers. Then my keyboard kept inserting random characters, kind of like there was an echo delay. It didn't do that when connected directly, only through the hub. Hey... wait a minute... I switched to the new even cooler USB hub I bought on a whim last time I was at Fry's, and voila, printer and keyboard work perfectly. It never even occurred to me that it could be hardware.

Oh yeah, the Macbook Air. I love it.

May 06, 2008 08:52 PM

Evgeniy Polyakov: New captcha solving problem.

Just in case you will notice some delay in filesystem or network development, reason is simple. I decided to devote some time to new captcha cracking problem, namely this ones:

Captcha problem

The reason is simple, I want to test my captcha breaking ideas on something which is real. And also I was frustrated by theirs abuse team, which was not able to fix spam filter based on messages I sent them (bounce and original, just like requested).
It is pretty unlikely though that something will appear anytime soon, but I do want to test some ideas... Comments (0)

May 06, 2008 04:00 PM

Dave Jones: wifi detectors.

I thought this was absurd when I saw it. Given the ubiquity of 802.11 signals pretty much everywhere (Even at my parents place in the boonies of the welsh valleys I noticed multiple APs), it seems kind of pointless. The sillyness doesn't stop there however. Those with a penchant for looking ridiculous can now also get some matching shoes.

FAIL.

wifi-detecting clothing is the hypercolor of this decade.

May 06, 2008 02:41 PM

Jaya Kumar: Technology you already have

Paul Krugman wrote a post titled "A technology I wish I had" where he says:

"Now, if I had a machine I could work on that used electronic ink, the way the Kindle does, I could be listening to birdsong this very minute.".

All I can say is that that is already done. :-) Yeah, it's not yet clean as a whistle but it does work.

Deferred IO gets you to the point that standard X11 works so you can run whatever you want. You could run standard apps on the Kindle, it uses a 400MHz xscale pxa 250 cpu, runs Linux, has standard usb host so tack on a keyboard and you've got yourself a "workable machine". Nothing exceptionally complicated there. By the way, lab126 kernel guy, if you're reading, drop me an e-mail.

May 06, 2008 08:09 AM

May 05, 2008

Evgeniy Polyakov: POHMELFS transaction support. Failover (re)connection to different servers.

POHMELFS just got full transaction support. So far it is only used in ->wrteipages() callback, which is invoked by writeback mechanism. POHMELFS uses lazy transaction support, namely it waits after each transaction, which includes header and data to be written for at most 14 pages, 14 is a magic number of pages, which corresponds to struct pagevec size, used by generic writeback, transaction size is limited by mount option and is 32 pages by default. Performance was dropped from 125 MB/s down to 64 MB/s, which is not acceptible. Main problem is of course waiting for transaction to be completed (i.e. completion message from server). There should not be per transaction waiting, instead writeback has to allocate as much transactions as needed and proceed one after another, and only start waiting for them, when there are no more pages to be written. This is the next task.

Transaction mechanism allows quite simple reconnection to different master servers in case of failure, and rollback of the failed transaction. For example one can provide different number of main servers (which have to be in sync with each other and be able to be synchronized themselfs, or they just can use shared storage), so POHMELFS client will switch between them if current one has failed. System will detect it and reconnect, if reconnect fails, next server will be used and the whole transaction will be resent there.
It is also possible to write transaction to different server on demand (it may or may not to be connected already, but it has to have address structure, so far it is only obtained during pre-mount configuration), which is a prerequistic for parallel data processing. One can create a simple patch to write transactions one after another to severs in round-robing fasion.

Right now only write transactions are used (and can be combined with object creation if needed), read ones are pending as long as multiple parallel transactions (which is not complex, but main task is how to wait them all to be completed, very similar code is used in pohmelfs_aio_read()).

There is also pending task of cache coherency support (server side originated messages to clients, which used the same pages, which another client is writing into, also including metadata coherency messages like uid/gid/inode size and other changes), it is not that complex task, and mostly requires server modifications.

Stay tuned! Comments (0)

May 05, 2008 07:00 PM

Dave Airlie: compiz on rs480/rs690 working!!

So if you have a DRI enabled ATI rs4xx/rs6xx integrated chipset and have been cursing my name because compiz doesn't work without foobar'ed textures, you can stop the cursing and start the praisin...

I've found the bug in the r300 swtcl path that caused this, the 3D driver uses the fragment shader to do rectangular textures, and this involves feeding the texture dimensions into the fragment shaders in constants. However the code to update those constants for new textures wasn't always getting called at the correct time in the swtcl path.

So http://cgit.freedesktop.org/mesa/mesa/diff/?id=a7016949f27f7612ffba7a4d0c5e6280cb3e66ba

is the fix and is now in mesa master, I'll pull it into mesa 7.0.x branch and I'll probably release F8 and F9 mesa packages with fixes. This probably won't make F9 GA but the 0-day mesa update will contain the fix.

This bug has been annoying me since July 2007 so woot!!

May 05, 2008 04:06 AM

May 04, 2008

Evgeniy Polyakov: Tanks in the city!

Wanted to visit Moscow and look how we play balalayka, drink vodka and walk with bears?
Not now, we drive our tanks instead.



The Victory Day repetition, april 29 night. Comments (2)

May 04, 2008 03:00 PM

May 03, 2008

Matthew Garrett:

It turns out that I was insightful four and a half years ago, and I should damn well listen to my own opinions.

May 03, 2008 11:50 PM

Matthew Garrett:

People rarely ask me about my amazing ability to remove fruitfly ovaries in under 10 seconds. "This skill makes me strangely attracted to you" they then inevitably fail to add, thus giving the lie to claims that a PhD in biology is a great way to impress women. In order to avoid future social embarrassment I have therefore decided to provide a step-by-step guide to removing ovaries from any fruitflies you may have to hand.

See? It's all very easy.

(Note: This protocol involves chemicals known by the state of California to cause cancer, birth defects and general fucked-upness. Do not ingest methanol unless you want to demonstrate competitive inhibition in the form of a bottle of vodka. Value of fruitfly ovaries may go up as well as down. Do not taunt fruitfly ovaries)

May 03, 2008 02:41 PM

May 02, 2008

Evgeniy Polyakov: Design of the POHMELFS transaction model.

It is heavily based on how netlink is implemented in Linux kernel. Besides the fact that it is likely the most ugly and complex protocol among communication models supported by the kernel, it is exactly the most effective, extendible and feature rich one.
This model is based on the attributes, which are embedded into the message. Each attribute has header, which includes size of the attached data. So, one can put effectively unlimited amount of data into any message (limited only by size field and practical assumptions of the communication), and it is possible to create message, which will contain any number of different attributes.
The main problem of the netlink is its padding and alignment ugliness. Protocol tries to get the every bit out of the communication, so there is huge amount of very hairy things there.

I like to drink and (un)fortunately I got pretty bad quality drinks some times, but I'm absolutely sure, when Alexey Kuznetsov designed netlink attrubute alignment policies he had really bad hangover after likely the ever worst crap he drunk.

So, netlink attributes are very ugly, but you can extend it how you like.
The same applies to POHMELFS transactions.

You can put any new attribute into the transaction in a very trivial manner (I worked with netlink alot, even created kernel connector to simplify kernel development side, so I know that taste), although transaction size is limited, it is controlled only by mount option (default is 32 IO vectors each one of PAGE_SIZE (4k on x86) in one transaction).

Thus one can easily implement for example any protocol security labeling, just add new per-packet attribute.

So, it is easily possible to infinitely extend communication protocol with full backward compatibility. Comments (0)

May 02, 2008 07:00 PM

Pete Zaitcev: BadName is essentially conquered

The issue with random applications failing to start (Firefox, Nautilus) or blowing up (panel, gvim) with BadName took me about 3 months to find (the bug was filed at the end of January). I'm not sure if my fix is any good, need to poke Ajax about it.

So... Wasted a lot of time, learned several mildly interesting things about the code and people involved.

The sad part is how much it takes to start moving around any modern codebase, and that's with the same language and toolchain. I remember times when no part of the system was off-limits, but these days... not so much. If anything breaks in OpenOffice, I'm not even going to try fixing it.

May 02, 2008 10:23 AM

James Morris: Labeled NFS Requirements Draft Submitted

Dave Quigley has just submitted an Internet Draft to the IETF outlining the requirements for Labeled NFS:

MAC Security Label Requirements for NFSv4 (link)

Abstract

This Internet-Draft outlines high-level requirements for the
integration of flexible Mandatory Access Control (MAC) functionality
into NFSv4.1 . It describes the level of protections that should be
provided over protocol components and the basic structure of the
proposed system. It also gives a brief explanation of what kinds of
protections MAC systems offer and why existing NFSv4 protection
mechanisms are not sufficient.

This draft is a generalization the original Security Enhanced NFS document posted last year, addressing the general need for mandatory access control support in NFS.

NFSv4 currently supports two access control schemes: standard DAC and ACLs. MAC labeling support is required for technologies such as SELinux and OpenSolaris FMAC.

Essentially what's needed is a way to convey MAC labels over the wire (for both setting and retrieving their values), and to be able to enforce security policy using those labels. The server needs to be able to determine the security label of the remote client process when enforcing policy, and all systems need to be able to ensure they understand each other's labels, or be able to translate them. A "Domain of Interpretation" (DOI) attribute is used to determine the meaning of labels, a term which may be familiar to those who've braved the IPsec specifications. The confidentiality and integrity of these security attributes must be protected in transit, while all parties need to be authenticated. We also need to be able to handle the case where either the client or server does not have MAC enabled, and to ensure non-breakage with existing implementations. There's a lot more in the details, but that's the gist of it.

It may seem at first glance that NFSv4 named attributes (NAs) would provide the required labeling functionality, but they're not a good fit. NAs are specifed as opaque to the system and user-managed, while MAC security labels are managed by the system. NAs also do not provide necessary semantics such as conveying client security attributes or negotiation of DOI. There are also issues with attribute namespaces (which are user-managed and unspecified) and labeling atomicity. Another possible approach is to implement Linux/BSD-style extended attributes (EAs), which are simple text string attributes associated with files, in contrast with the NA "subfile" scheme. This would potentially only solve the attribute namespace issue, and is also not a good general solution. EAs are also not currently part of the NFSv4 specification, and it seems like a contentious area in any case.

The current Labeled NFS prototype code utilizes NFSv4 recommended attributes (RAs), which are fully extensible, already exist, and are already used for similar management of metadata (e.g. ACLs). This seems to be the simplest and most straightforward approach.

Once there's consensus on the requirements, the next step will be to develop a protocol specification and hopefully have it incorporated into NFSv4. v4.1 is currently in "last call", so the next candidate would be v4.2, it seems. The prototype code for Linux/SELinux will continue to be developed alongside the standards process.

For those interested in following or contributing to the project, there are several relevant mailing lists:

Dave is hoping to have further discussion IETF 72 in July, and will be presenting on the state of the project at the SELinux Developer Summit ahead of that.

May 02, 2008 12:54 AM

May 01, 2008

Dave Jones: a very bizarre lunchtime.

I went to lunch at one of the towns 49 all-quite-the-samey sushi restaurants that I'd not tried yet. I was seated next to someone who seemed to be some kind of dial-a-psychic. She had tarot cards all over the table, and appeared to be giving a 'reading' over the phone whilst she gobbled down maki. She had a companion who was also on the phone during the entire meal. I don't think they spoke a word to each other. I'm not sure if she was also a psychic. The whole experience was very surreal.

Then they left, and five minutes later the 'psychic' returned looking for her lost credit card or something. Priceless.

May 01, 2008 06:37 PM

Pavel Machek: Slowest machine

...to run "quick benchmark" was Wouter Verhelst's Mac IIci (running linux on 68030) -- thanks!

time factor $[65863223*65863159]
4337959928701457: 65863159 65863223

real 6m16.660s
user 6m9.920s
sys 0m4.530s

...that's 376 seconds, and almost exactly 1000 times slower than my home desktop.

Fastest machine seems to be djwong's Xeon E5450 (3GHz): 0.116s, 3000 faster than Mac II.

Do you have faster or slower machine?

May 01, 2008 06:09 PM

April 30, 2008

Evgeniy Polyakov: B.B. King bar.



A bit more in gallery.

Cool place with interesting people... Although it was a bit loud and not very convenient to see the band, but nevertheless it was fun. Comments (0)

April 30, 2008 12:00 PM

Dave Miller: Effective GIT bisecting...

I've had to do a lot of this lately, and the most efficient attempts take on a certain pattern.

So you have a bug, and you can readily reproduce it. Also, the bug appeared in the last pull you made from Linus's tree. Perfect.

At this point you know that 'master' has the bug and that 'ORIG_HEAD' lacks the bug. You could just blindly bisect the whole thing, but you can save yourself some time (and also learn a bit about the nature of the bug) by using some clues and some quick tests to narrow things down a lot from the beginning.

This determination can be easy. Your goal is to first find a spot which you think works. You'd like it to be something a bit further than ORIG_HEAD, as you're trying to narrow things down.

The easy case is some driver breaks or similar, or you see some error message and it's clear what subsystem that came from. Take that information and use it to scan over the changesets you got from your pull:

	gitk ORIG_HEAD..
Note that when you select a changeset in gitk, the SHA ID of that commit becomes the current X selection. You can use that to do things more quickly below.

So you're found a sequence of commits that look suspicious. Pick the changeset before the first commit in the suspicious set, and check out a test tree with it as the tip into a test branch:

	git checkout -b test $(SHA_ID)
Build that kernel, and make sure the bug doesn't happen. Let's assume that this kernel passes your test. You have a few options on how to proceed.

The easiest thing to do is to just bisect using the information you now have:

	git bisect start
	git bisect bad master
	git bisect good test
and so on. Build, test boot, and if it shows the bug:
	git bisect bad
else if it succeeds:
	git bisect good
and repeat the process until GIT shows you the guilty commit.

The other option is to try and figure out an approximate more optimal end point for bisection. Take the set of "suspicious" changeset your determined above, and take the one after the last and go:

	git checkout -b test2 $(SHA_ID)
If this shows the bug, you're in business:
	git bisect start
	git bisect bad test2
	git bisect good test
and continue as detailed above.

When you're done with all of this:

	git bisect reset
and report your results to the mailing list.

April 30, 2008 06:08 AM

April 29, 2008

Evgeniy Polyakov: New and old toys.

Real enlargement...



The only thing missing is photo skills...
But I work on it.

After I've spent quite a lot of money I suddenly decided that it is a really good feeling - to have what you want, no matter what the price is. I can not afford some things, but looking really closely I've decided that having lots of smaller really cool stuff is better (for now) than collecting for a (really) long time to get something really big. I already did that, now its time for smaller every-day fun :)

So, no bike for now. I was torn between Honda CBR 400-600, BMW K1200 or around, or classical chopper models, no Harley of course, but... Anyway I'm not able to register it and get bike numbers, and I do not have a bike driving license.
The same applies to cars (what I already had I really do not want to get again, but what I want requires some). So, my simple stuff.

Comments (0)

April 29, 2008 08:00 PM

Dave Jones: Fun with an OQO

Just as the novelty of the eeepc was beginning to wear off, today I got a model 2 OQO in the mail. It's a pretty nifty little device. First impression on opening the box and pulling it out was that it was a little 'chunkier' than I was expecting. It's pretty heavy at 3lbs. A whole pound heavier than the Eee. Similar slide-out keyboard to the n810. Much bigger screen.

It comes with Vista pre-installed. I powered it up, just to make sure it's all working. This part took forever. It sat at a "please wait" screen for ages. After 15 minutes, and two reboots, I got to set up my user account. It then spent another five minutes "checking my computers performance" and god alone knows what else. During this time, the device got really hot, and the fans starting running full tilt. For considerable amounts of time whilst I was waiting for it to do something, I was staring at a black screen with a mouse pointer. I had no idea if it had crashed, or was actually doing something. This was my first vista 'experience', but before I'd even gotten to a desktop, I'd decided that modern Linux installations are leaps and bounds ahead in terms of user experience in this regard.

Finally, 25 minutes after I'd hit the on button, I got to a desktop. I moved the mouse pointer, and the screen changed to "shutting down". It rebooted. What was this, the 4th, or 5th time? I'd lost count. After another minute, it had booted up. I fiddled around a little, before quickly becoming bored with it. The fans ran almost constantly. Sitting at an idle desktop, vista pulled around 9 watts, with spikes every few seconds at 13W. Occasionally, it would go as high as 15W. Again, it was completely idle all this time.

After getting bored with trying to beat Vista into using my wireless, I rebooted, and found my way into the bios (Fn-Del). From there, a found numerous things to twiddle (like, enabling PXE as the first boot choice). Surprisingly, there was also a 'Enable ACPI CPU C4" option which was disabled by default. Enabling it didn't cause Vista to use any less power. I guess it's being woken up so frequently that it never gets into those lower states. Given Linux can and will exploit it however, I left it enabled.

Booted up a rawhide install over PXE. Idling, it bounces around 8.4 to 8.9 watts. This isn't an apples to apples comparison though. Sitting in Anaconda is a lot less intensive than an 'idle' desktop.

At this point, I hit my first problem. The ethernet (a Realtek RTL8139, which should be well understood) isn't noticing a link, and hence refuses to get a DHCP release. Some futzing around creating a debug initrd with a shell later, and I discovered that the chip gets discovered just fine. But with one caveat. It detects the MAC address as 00:00:00:00:00:00.

More poking later, but for now, I'm drawing a blank.

April 29, 2008 05:29 PM

David Woodhouse: 29 Apr 2008

I pay my telephone bill to British Telecom by Direct Debit — it's taken from my bank account directly, under their control (albeit with fairly decent safeguards).

Strange, therefore, that I got a call yesterday from their missing payments department chasing up my last bill, which they hadn't bothered to take for some reason. They left a message with a number to call them back on, and a reference number. Yet when I called, the person there seemed unable to do anything useful like checking why they hadn't bothered to take the payment. She just said she'd have to get someone more clueful to call me. I wonder why I wasn't asked to call that person in the first place?

Immediately after the call I checked my bank statement, and it seems that the Direct Debit was actually taken — yesterday. So I helpfull called back and told them that, since they didn't seem clueful enough to work out for themselves what they were doing.

Today I got another phone call, and another British Telecom representative lied to me by telling me that the bill was still unpaid.

Fucking Useless Telco.

April 29, 2008 02:19 PM

James Morris: 2008 SELinux Developer Summit Schedule Now Up

We managed to get the SELinux developer summit schedule published a few days early. Hopefully, this will help people who are making travel arrangements to OLS.

As mentioned, a lot of high quality proposals were submitted. To ensure that all important topics can be covered, the format of the summit has been changed to moderated discussion panels with presentations; rather than the original plan of having a set of fixed-length presentations followed by discussion panels.

Presentations will now be 10-20 minutes, with a greater focus on discussion. This provides much more flexibility, and is derived somewhat from experience with the kernel networking summit, which has been very successful with short presentations driving discussions.

The panel sessions are as follows:


More detailed information, including topics, issues, and links to abstracts may be found at the schedule page. Also see the printable version and the topics page.

All SELinux developers and folk with a technical interest in SELinux and related technologies are welcome to attend. Don't forget that you also need to be registered to attend OLS.

April 29, 2008 01:26 AM

Matthew Garrett:

Why has the phrase "Zuckerberg's famous pig" not appeared on the internet up until this point?

April 29, 2008 12:36 AM

April 28, 2008

Matthew Garrett:

As Adam mentioned, Facebook has a new feature to let you search for the frequency of various words used in communication over time. You probably need to be a Facebook user to see any of these, but interesting ones include never drinking, rape (tends to peak on Sundays) and dead, which demonstrates the somewhat bizarre effect of Heath Ledger on the internet.

April 28, 2008 09:59 PM

Evgeniy Polyakov: POHMELFS transactions and ACID.

POHMELFS just got initial transactions support and ability to connect to multiple master servers. Master servers are those, which will say, where data is placed. Essentially they are the same severs which may provide that data, but main server addresses are provided during pre-mount configuration time, and data server addresses will be provided by main servers (if main ones will not want to return data) in run-time.
Also main servers can be used to request data in parallel or to switch between them, when curently active one has failed.

So far it is a theory, practice is rather miserable: POHMELFS client connects to multiple servers, but works with only one. Errors are detected, and switch to the next server can happen, but it is not done. Since there is a serious problem with this approach: neither server nor client support ACID for data being written.

Here we come to transaction introduction: it is multiple commands wrapped into single atomic operation. In case of error during transaction write, the whole one will be resent to different server (or the same one after reconnect). This is rather simple (although transactions are not supported by server and client does not wrap any command into it yet), but it still does not solve ACID problem.

Since POHMELFS has writeback cache, all its writes never reach server, instead writeback is scheduled by the system, and it starts writing pages to the server. Current POHMELFS implementation uses only ->writepage() method, which is invoked for each page. It does not require server to return explicit acknowledge, that page was written, instead it relies to underlying transport protocol (like TCP) to handle guaranteed delivery, so data can be queued somewhere when connection was dropped, so POHMELFS client does not know if data was really written or not. Having per-page acknowledge can fix ACID problem realy trivially, but that may (or may not) end up with severe performance degradataion. As a better solution I consider own ->writepages() implementation, where each transaction will contain multiple pages to be written and thus smaller amount of explicit acks from server to be received, and thus smaller performance degradataion. In case of failure whole transaction has to be resent to different server of course.

Server does not support data mirroring to multiple root directories yet, so actually not too much is implemented from above description, but transactions and multiple server connections exist and soon client will get support for reconnection and proper transaction processing. Comments (0)

April 28, 2008 06:00 PM

Dave Airlie: Going for surgery tomorrow...

Well I've had a badly aligned septum for a few years so I'm going in tomorrow to have it fixed. I'll be doing very little for the next couple of days and I have to stay at home for a couple of weeks!!

April 28, 2008 05:56 AM

Jaya Kumar: Lim Kit Siang is cool

I'm so happy to notice this post by Jeff Ooi who is now an elected member of parliament. I hope the UMNO regime will be forced to free all the political prisoners soon.

April 28, 2008 06:03 AM

James Morris: SELinux documentation in Portuguese / Monografia sobre SELinux

Jeronimo Zucco has published some SELinux documentation in Portuguese:
Hardening Linux Usando Controle de Acesso Mandatório.

Relatedly, I just read Spot's report on attending FISL in Brazil. Sounds like it was an exciting and productive event. With over 7000 attendees, I wonder if this was the largest FOSS conference ever?

April 28, 2008 01:08 AM

April 27, 2008

Muli Ben-Yehuda:

I spent a couple of pleasant hours today trying to wrap my head around KVM's MMU code. After reading "The Shadowy Depths of the KVM MMU", it suddenly starts to - almost - makes sense.

The tentative agenda for the 2008 KVM forum has been published and is chock-full of awesomeness.

April 27, 2008 07:58 PM

Pavel Machek: Single AA phone charger

I have 10+ NiMH accumulators nearby, still my cellphones sometimes run out of power.

My first solution was to create pack for 4 of these, with USB connector, and charge from that. That did not work too well.

Now I got this toy... and it seems to work quite well. It also may be solution to my "1W headlamp from single battery" -- it provides ~500mA@3.6V with full accumulator. Is there some simple current limiting circuit I could use? (Regulator seems to have 5.5V max, I need 300mA for a 1W led, and it would be very nice if it was possible to limit current down to 10mA or so.)

Manufacturer says cellphone gets charged for ~1day standby from single AA, and that seems to be about right. (Actually n6230 seemed to last 1.5 days, but the test left something to be desired). I wonder how big conversion losses are: 1.5 days is definitely not full capacity of lion in n6230; and it has 700mAh li-ion ~= 2520mWh. Single NiMH should be ~2100mAh ~= 2520mWh, too...

April 27, 2008 07:32 PM

Pavel Machek: pavel's quick benchmark

This one actually has a huge advantage: factor seems to be part of default installation, so it is easy to run on all machines down to sharp zaurus.

It also shows just how slow those embedded cpus are. I always assumed that arm@400MHz was approximately as fast as pentiumMMX@200MHz. ... that's quite far away from truth.

root@amd:~# time factor $[65863223*65863159]
4337959928701457: 65863159 65863223

amd: 0.591 s
eee @ 630: 1.673 s
eee @ 900: 1.858 s ?!
arima: 0.536 s
hobit: 0.376 s
zaurus: 21.99 s
kohjinsha: 4.062 s

Surprising notes here: Asus eee is much faster than kohjinsha. (In fact, eee has pretty cool CPU, it can compile kernel in like 15 minutes, unlike kohjinshas 90 minutes). And even kohjinsha's geode@500MHz is 5 times faster than zaurus... So zaurus is more like pentium@100MHz.

And now... What's the best and the worst result you can get in pavel's benchmark? Does someone has 386sx still running? Or will some ARMs be even slower? And is there anyone with those pentium extreme's at 3GHz?

April 27, 2008 06:47 PM

Pavel Machek: Asus EEE to play with

Asus EEE is a pretty cool machine. It actually has an usable keyboard, and its linux distribution is very nicely done, and well integrated with the hardware.

Unfortunately, its ACPI BIOS is crap: no thermal zones, and battery values completely off. (It seems to only report values in percent, but it does so by claiming battery is 100mAh and counting from that. Oops).

It took me a while to figure out touchpad really has two buttons... it is one big button, and depending on you pressing left or right side it means left or right click. Unfortunately that means you can't even use X's 3 button emulation, and I have not figured out how to copy & paste between xterm (well hidden behind ctrl-alt-t) and browser.

...speaking of which, this is actually my first machine where flash works including audio, and I did not even have to agree to stupid EULA. Skype also works, which is a plus (had to deal with ugly EULA there :-().

April 27, 2008 06:25 PM

Ted Tso: Donald Knuth: “I trust my family jewels only to Linux”

Andrew Binstock interviewed Donald Knuth recently, and one of the more amusing tidbits was this:

I currently use Ubuntu Linux, on a standalone laptop—it has no Internet connection. I occasionally carry flash memory drives between this machine and the Macs that I use for network surfing and graphics; but I trust my family jewels only to Linux.

More seriously, I found his comments about about multi-core computers to be very interesting:

I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the “Itanium” approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.

Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX….

I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.

This is a very interesting issue, because it raises the question of what next-generation CPU’s need to do in order to be successful. Given that it is no longer possible to just double the clock frequency every 18 months, should CPU architects just start doubling the number of cores every 18 months instead? Or should they try to concentrate a lot more computing power into an individual core, and optimize for a fast and dense interconnect between the CPU’s? The latter is much more difficult, and the advantage of doing the first is that it’s really easy for marketing types to use some cheesy benchmark such as SPECint to help sell the chip, but then people find out that it’s not very useful in real life.

Why? Because programmers have proven that they have a huge amount of trouble writing programs that take advantage of these very large multicore computers. Ultimately, I suspect that we will need a radically different way of programming in order to take advantage of these systems, and perhaps a totally new programming language before we will be able to use them.

Professor Knuth is highly dubious that the later approach will work, and while I hope he’s wrong (since I suspect the hardware designers are starting to run out of ideas, so it’s time software engineers started doing some innovating), he’s a pretty smart guy, and he may well be right. Of course, another question is whether what would we do with all of that computing power? Whatever happened to the predictions that computers would be able to support voice or visual recognition? And of course, what about the power and cooling issues for these super-high-powered chips? All I can say is, the next couple of years is going to be interesting, as we try to sort out all of these issues.

Originally published at Thoughts by Ted. Please leave any comments there.

April 27, 2008 01:00 AM

April 26, 2008

Evgeniy Polyakov: Detailed POHMELFS roadmap.

Transaction support will be added into kernel client. It is possible that it will be exported to userspace (thus it will be synchronous write-through operations).
Also kernel client will get locking support (fcntl() ones first, then more fine-grained ones), this is different from byte-range read/write locking, which will be done on server. It is possible to export it to client too (and will be part of POHMELFS locking API actually, which will be used for fcntl() too).
The simplest case is data invalidation in client's cache (i.e. if one client issued a writeback for given page, it has to be marked as not up-to-date on other clients). Likely it will be done at the beginning of the next week. So far it will be the last cache coherency item. Task is relly simple because of asynchronous processing of all data in kernel client. Server will have to store not only index of directories to watch for object changes there, but also per-object set of pages, read by client, so that appropriate users could be notified, that page is no longer up-to-date and has to be refreshed.

Userspace server will get parallel and distributed facilities. Parallel processing will be done first by allowing lookup and readdir callbacks return inormation about objects, which will contain address of the server where object is actually located, so that server could read, write or check status there. So far the whole file will be stored on a server, i.e. for the first implementation there will not be a possibility to store half of the file on one server and another half on different one. Then it can be extended.
Server will get ability to store data on different root directories (so that client was not able to see shadow copies). There will be simple regexp policies for data storing, for example '*.jpg' has to be stored in root1 and root2, '*.txt' only in root1 and so on. Each root directory can be local or remote mounted one, userspace does not care about this issues.

Main part is already completed: I have a vision of what system has to provide and how it will look like, so with good design of the low-level mechanisms it becomes a doable task for the predictible timeframe.

Stay tuned! Comments (0)

April 26, 2008 10:00 PM

Ted Tso: Organic vs. Non-Organic Open Source, Revisited

There’s been some controversy generated over my use of the terminology of “Organic” and “Non-Organic” Open Source. Asa Dotzler noted that it wasn’t Mozilla’s original intent to “make a distinction between how Mozilla does open source and how others do open source”. Nessance complained that he didn’t like the term “Non-Organic”, because it was “raw and vague - is it alien, poison, silicon-based?” and suggested instead the term “Synthetic Open Source”, referencing a paper by Siobhán O’Mahony, ” What makes a project open source? Migrating from organic to synthetic communities”. Nessance referenced a series of questions and answers by Stephen O’ Grady from Red Monk, where he claimed the distinction between the two doesn’t matter. (Although given that Sun is a paying customer of Red Monk, Stephen admits that this might have influenced his thinking and so he might be “brainwashed” :-).

So let’s take some of these issues in reverse order. Does the distinction matter? After all, if the distinction doesn’t matter, then there’s no reason to create or define specialized terminology to describe the difference. Certainly, Brian Aker, a senior technologist from MySQL, thinks it does, as do folks like me and Amanda McPherson and Mike Dolan; but does it really? Are we just saying that because we want to take a cheap shot at Sun?

Well, to answer that, let’s go back and ask the question, “Why is Open Source a good thing in the first place?” It’s gotten to the point where people just assume that it’s a good thing, because everybody says it is. But if we go back to first principals maybe it will become much clearer why this dinction is so important.

Consider the Apache web server; it was able to completely dominate the web server market, easily besting all of its proprietary competitors, including the super-deep-pocketed Microsoft. Why? It won because a large number of volunteers were able to collaborate together to create a very fully featured product, using a “stone soup” model where each developer “scratched their own itch”. Many, if not most, of these volunteers were compensated by their employers for their work. Since their employers were not in the web server business, but instead needed a web server as means (a critical means, to be sure) to pursue their business, there was no economic reason not to let their engineers contribute their improvements back to the Apache project. Indeed, it was cheaper to let their engineers work on Apache collaboratively than it was to purchase a product that would be less suited for their needs. In other words, it was a collective “build vs. buy” decision, with the twist that because a large number of companies were involved in the collaboration, it was far, far cheaper than the traditional “build” option. This is a powerful model, and the fact that Sun originally asked Roy Felding from the Apache Foundation to assist in forming the Solaris community indicates that at least some people in Sun appreciated why this was so important.

There are other benefits of having code released under the Open Source license, such as the ability for others to see the implementation details of your operating system — but in truth, Sun had already made the Source Code for Solaris available for a nominal fee years before. And, of course, there are plenty of arguments over the exact licensing terms that should be used, such as GPLv2, GPLv3, CDDL, the CPL, MPL, etc., but sometimes those arguments can be a distraction from the central issue. While the legal issues that arise from the choice of license are important, at the end of the day, the most crucial issue is the development community. It is the strength and the diversity of the development community which is the best indicator for the health and the well-being of an Open Source project.

But what about end-users, I hear people cry? End users are important, to the extent that they provide ego-strokes to the developers, and to the extent that they provide testing and bug reports to the developers, and to the extent that they provide an economic justification to companies who employ open source developers to continue to do so. But ultimately, the effects of end-users on an open source project is only in a very indirect way.

Moreover, if you ask commercial end users what they value about Open Source, a survey by Computer Economics indicated that the number one reason why customers valued open source was “reduced dependence on software vendors”, which end users valued 2 to 1 over “lower total cost of ownership”. (Which is why Sun Salescritters who were sending around TCO analysis comparing 24×7 phone support form Red Hat with Support-by-email from Sun totally missed the point.) What’s important to commercial end users is that they be able to avoid the effects of vendor lock-in, which implies that if all of the developers are employed by one vendor, it doesn’t provide the value the end users were looking for.

This is why whether a project’s developers are dominated by employees from a single company is so important. The license under which the code is released is merely just the outward trappings of an open source project. What’s really critical is the extent to which the development costs are shared across a vast global community of developers who have many different means of support. This saves costs to the companies who are using a product being developed in such a fashion; it gives choice to customers about whether they can get their support from company A or company B; programmers who don’t like the way things are going at one company have an easier time changing jobs while still working on the same project; it’s a win-win-win scenario.

In contrast, if a project decides to release its code under an open source license, but nearly all the developers remain employed by a single company, it doesn’t really change the dynamic compared to when the project was previously under a closed-source license. It is a necessary but not sufficient step towards attracting outside contributors, and eventually migrating towards having a true open source development community. But if those further steps are not taken, the hopes that users will think that some project is “cool” because it is under an open-source license will ultimately be in vain. The “Generation Y”/Millennial Generation in particular are very sensitive indeed to Astroturfing-style marketing tactics.

Ok, so this is why the distinction matters. Given that it does, what terms shall we use? I still like “Organic” vs “Non-organic”. While it may not have been intended by the Mozilla Foundation, the description in their web page, “only a small percentage of whom are actual employees [of the Mozilla Foundation]”, is very much what I and others have been trying to describe. And while I originally used the description “Projects which have an Open Source Development Community” vs “Projects with an Open Source License but which are dominated by employees from a single company”, I think we can all agree these are very awkward. We need a better shorthand.

When Brian Aker from MySQL suggested “Organic” vs “Non-Organic” Open Source, and I think those terms work well. If some folks think that “Non-Organic” is somehow pejorative (hey, at least we didn’t say “genetically modified Open Source” :-), I suppose we could use Synthetic Open Source. I’m not really convinced that is any much more appetizing, myself, however.

So what would be better terms to use? Please give me some suggestions, and maybe we can come up with a better set of words that everyone is happy with.

Originally published at Thoughts by Ted. Please leave any comments there.

April 26, 2008 04:27 AM

April 25, 2008

Dave Jones: Meat Beat Manifesto.

In comparison to last weeks excursion to the middle east, last nights gig was many kinds of awesome. Meat Beat Manifesto put on a real cool visual show. Total sensory overload. Of course, the music was also really good, with a mix of new stuff I hadn't heard before, along with a bunch of old favorites like Helter Skelter, Radio Babylon etc.

I managed to find a really good spot to take a ton of photos. Somehow I even managed to stay in a section that was later cordoned off without being evicted by the security guys like everyone else. Maybe they thought I was the band photographer or something. I wasn't complaining.

At the end of the night, I was somewhat relieved to not have a repeat of the craziness after last weeks show. Autechre had just walked off stage, and the security guys were marshaling us upstairs so they could close that part of the club. Ajax and I still had some beer left, so we took a seat, and finished. After which, we found out that we still had time for another, so the inevitable ensued. We were barely halfway through drinking when we were told we had to leave. Immediately. "Sure" we said, "just need to go the bathroom". I waited outside the bathroom whilst Ajax did his business. Shortly afterward, he reappears, and I enter. At this point, things go a bit weird. The lights go out. "Hey" I shout. The lights go back on. A few seconds later, they go off again. I holler again, but this time, no reactivation of the lights. Being a resourceful type, I happened to have a torch on my keyring. So I'm finishing up by torchlight, and finally emerge from the bathroom. Into complete darkness. All the lights in the place are out. Not a soul to be seen. "Strange" I think, and head to the door. Locked. Apparently from the outside. I wander around for a while shouting. No response. I wandered some more pushing doors to no avail, and then saw the fire exit. "This has to open" I thought, but then I hesitated. "What if an alarm goes off?". I pondered my quandary for a moment, before thinking to hell with it, and shoved the door open, and fled to the street, to bump into Ajax again who was wondering where the hell I had got to.

Crazy.

And then, we had pancakes. And all was right with the world again. There is no problem that ihop can't solve.

April 25, 2008 05:11 PM

Evgeniy Polyakov: POHMELFS release.

Vodka and beer together are glad to provide a new POHMELFS release for you.

POHMELFS stands for Parallel Optimized Host Message Exchange Layered File System.

This is a high performance network filesystem with local coherent cache of data and metadata.
Its main goal is distributed parallel processing of data. Network filesystem is a client transport.
POHMELFS protocol was proven to be superior to NFS in lots (if not all, then it is in a roadmap) operations.

Basic POHMELFS features:

Roadmap includes: One can grab sources from archive or check a homepage.

Enjoy!

P.S. Moved to listen blues and drink a beer. Comments (0)

April 25, 2008 04:00 PM

Ted Tso: Links — 2008-04-25

The Open Source Commands
Really good ideas that companies should take to heart.
Open Source Commandments II: Passover Penguins
More really good ideas, especially for companies like Sun…
Did Canonical Just Get Punked by Red Hat and Novell?
Interesting thoughts about Linux desktop strategies
rPath to OEM SUSE Linux Enterprise Server from Novell for Appliances
I know a bunch of the folks at rPath, and I very much respect their technology; I think this is a very good thing for them.
Does Microsoft CEO Steve Ballmer need an intervention?
Does anyone think a Microsoft/Yahoo merger makes sense besides Mr. Ballmer?

Originally published at Thoughts by Ted. Please leave any comments there.

April 25, 2008 02:55 PM

Val Henson: Even the criminals have MySpace pages now

Back in 1995 when the web was just getting off the ground, I went off to university and didn't watch TV for a couple of years. So the next time I turned on the TV, I was immediately struck by the fact that all the car ads had URLs now.

Something similar happened to me this week. I'm down in L.A. visiting a client fortuitously located across from a building covered in the most amazingly beautiful graffiti. One section has a URL for the artists' MySpace page:

I started keeping an eye out for others:

Check out the Google Street View for more awesome graffiti.

April 25, 2008 03:13 AM

April 24, 2008

Evgeniy Polyakov: Solaris vs 'Have you ever kissed a girl?'

As started by Ted Tso.

We forgot the answer:

No, but I can kiss the sky
He was 22 that days? :)

From my developer's point of view Solaris first sucks because of its contributor agreement. There is no way I can devote my time to organization, which will get my work for free and do whatever they want with it without my opinion as author (Actually the same applies to BSD-style at some degree. Yes, that can be trivial greediness).

It is not _that_ bad OS, but there is no known practice in modern medcine of deadman awakening.
Slolaris has its niche, but that's it, although Linux can be tuned to be faster (or if it has some bugs, they can be fixed) in that areas, but that does not matter, people who make decisions already know that they want.

Pseudo openness of the Solaris is just a marketing noise. Those who want to hear it will hear just that, no matter how things are in real life. Comments (1)

April 24, 2008 09:00 PM

Evgeniy Polyakov: Second POHMELFS release.

Is scheduled for tomorrow, today I have to prepare myself for it. The whole idea and implementation started during fun new year vacations, so I have to repeat process at least at some degree...

This release will not include direct writing to userspace from async thread, since this approach happend to be really non-trivial. What I described for the page fault handling works only for the first fault, when page is populated into the table, it can be referenced and written into and thigs just work. Problem happens when the same page used for the second read (i.e. new try from the userspace, for example if to increase size of written data to more than two pages, 'cat' will use the same two pages to read data). With the second write from the kernel there will be page fault again, although page exists in table, and fault can not be handled (at least its reason will not be removed, since it will happen again and again), since page table entry looks really good for the system, but not for the CPU.
I checked two cases: usual copy_to_user() from kernel on behalf of userspace thread invoked a read syscall, and the same code, but copy was performed from the different thread. Page table entry (pte) looks very similar in both cases (in regards of all flags at least), but fault happens for the second write into the same page always, when thread's mm context was changed to point to original userspace one.
This does not change if userspace thread was or was not scheduled away from its CPU.
Difference from get-user_pages() in this part is mainly the fact, that resulted page is locked in the kernel (by increasing its reference counter at least), but I still want to produce the same behaviour as usual page fault during copy on behalf of userspace thread.
So, I stuck with this problem, but since it is very interesting I will find a solution.

Meanwhile, this release will include following things:

Stay tuned! Comments (0)

April 24, 2008 08:00 PM

Ted Tso: Organic vs. Non-organic Open Source

Brian Aker dropped by and replied to my previous essay by making the following comment:

I believe you are hitting the nail on the “organic” vs “nonorganic” open source. I do not believe we have a model for going from one to the other. Linux and Apache both have very different models for contribution… but I don’t believe either are really optimized at this point.

Optimization to me would lead to a system of “less priests” and more inclusion.

I made an initial reply as comment, and then decided it was so long that I should promote it to a top-level post.

I assume that when Brian talks about “organic open source” what he means is what I was calling an “open source development community”. Some googling turned up the following definition from Mozilla Firefox’s organic software page: “Our most well-known product, Firefox, is created by an international movement of thousands, only a small percentage of whom are actual employees.”

This puts it in contrast with “non-organic” software, where all or nearly all of the developers are employed by one company. (And anyone who proves talented at adding features to that source base soon gets a job offer by that one company. :-) By that definition we can certainly see projects like Wine, Mysql, Ghostscript (at one time), and others as fitting into that model, and being quite successful. There’s nothing really wrong with the non-organic software model, although many of them have struggled to make enough money when competing with pure proprietary softare competitors, with MySQL perhaps being the exception which proves the rule.

In most of these cases, though, the project started more as an organic open source, and then transitioned into the non-organic model when there was a desire to monetize the project — and/or when the open source programmers decided that it would be nice if they could turn their avocation into a vocation, and let their hobby put food on the family table.

Solaris, of course, is doing something else quite different, though. They are trying to make the transition from a proprietary customer/supplier relationship to trying to develop an Open Source community — and what Jon’s candidate statement pointed out is that they weren’t really interested in creating an organic open source developer community at all, but they wanted the fruits of an open source community — with plenty of application developers, end-users, etc., all participating in that community.

We don’t have a lot of precedent for projects who try to go in this direction, but I suspect they are skipping a step when they try to go to the end step without bothering to try to make themselves open to outside developers. And by continuing to act like a corporation, they end up shooting themselves in the foot. For example, the OpenSolaris license still prohibits people from publishing benchmarks or comparisons with other operating systems. Very common in closed-source operating systems and databases, but it discourages people from even trying to make things better, both within and outside of the Open Solaris core team. Instead, they respond to posts like David Miller’s with “Have you ever kissed a girl?”. (Thanks, Simon, for that quote; I had seen it before, but not for a while, and it pretty well sums up the sheer arrogance of the Open Solaris development team.)

So while Linux may not be completely optimized in terms of “less priests” and more inclusion, at least over 1200 developers contributed to 2.6.25 during its development cycle. Compared to that, Open Solaris is positively dominated by “high priests” and with a “you may not touch the holy-of-holies” attitude; heck, they won’t even allow you to compare them to other religions without branding you a heretic and suing you for licensing violations!

Originally published at Thoughts by Ted. Please leave any comments there.

April 24, 2008 06:13 PM

Dave Jones: random thoughts from a random (jetlagged) mind.

Post traveling epilogue.

April 24, 2008 01:57 PM

Matthew Garrett:

Yes, because getting offended when people voice their opinion that women belong chained to the sink or that (insert race here) are genetically inferior or (insert sexual practice here) indicates dangerous deviancy is a clear sign of lack of personal integrity and self-awareness.

I give up. Life is significantly better at parody than I am.

April 24, 2008 11:08 AM

Zwane Mwaikambo: Let Them Eat Cake

You might want to skip this if you don’t want to read a politically and sociologically insensitive post. I had the strangest dialog with a gentleman outside the local supermarket “pan-handling” (that sounds so much better than begging).

pan-handler: Got any spare change?
zwane: Sorry i’m afraid not. (i generally don’t carry cash)
pan-handler: I’m hungry, can you buy me some food?
zwane: Hmm well how about a loaf of bread? (I was offering him my bread)
pan-handler: Er, nah it’s ok, i’ll wait for someone to buy me something.

This proves that even pan-handlers have options in socialist countries! Or perhaps he was concerned that the vicar and i won’t have sandwiches to go with our tea time discussion.

April 24, 2008 03:09 AM

Matthew Garrett:

Dear Russell,

You <------------------------------------------>Perth<---------------------------------->The point

Kthxbi,

Matthew

Updated: This makes me think that there ought to be a moon somewhere in the above diagram.

April 24, 2008 02:25 AM

Dave Jones: vacation over.

Got back a few hours ago from my vacation in England for the last week. Got up this morning to face London in the pouring rain. By comparison, when I finally got back to Boston, I had to remove various articles of clothing to avoid passing out in the heat. A pleasant surprise to return to. It was great catching up with people in England, and not really doing much (isn't that the whole idea of vacations?), but the color of everything there was just slightly not good enough. The weather was pretty miserable the whole week, spare for an occasional gap in the clouds.

Tomorrow is definitely a "do nothing much except read email" day. There's just ridiculous amounts of it.
Also, due to ineptitude on my part, I just accidentally deleted a whole swathe of it unread. If you sent me something important, and I don't reply in a day or two, chances are it was in that batch, and you should resend.

April 24, 2008 01:34 AM

April 23, 2008

Jaya Kumar: In the big city

Got to Bangalore yesterday. Looks like the traffic is heavier but for the good reason that the metro is being built. Yesterday was also the last day for all the politicians to file some kind of electoral papers so traffic seemed extra jammed. IPL "warrior" hoardings everywhere too. Lots of new flyovers too. I walked around Richmond Road in the morning and there's flyovers everywhere which I don't think I remember from the last time.
Walked around at night and got to this place called Ista. Very swanky new hotel with a nice bar but everything is so expensive. Rs 200 for mango juice! Very spacey and quiet. Nice. But really I think most cheapskates like me would prefer a simple clean inexpensive no-frills hotel rather than pay for the new-age-western-swank-factor surcharge.

April 23, 2008 07:49 PM

Pete Zaitcev: Ted Tytso on [Open]Solaris

Ted suddenly decided to talk OpenSolaris. Pretty interesting... at least for me, since I spent 7 best years of my life in Sun's orbit.

In passing, aside from the bulk of the post, it seems to me that the final argument, about competitors selling Solaris support, does not hold water. This is exactly what Oracle attempted with their clone of CentOS and they weren't very successful, despite having a strong Linux team under Wim.

Other than that, he's probably right. But he's going to get responses. Whenever I mention Solaris (last time it was when I linked to Jeff Bonwick's blog), I get the most inane responses from Solaris fanboys. It looks like a very vocal community of users, if not contributors. Sounds like Apple almost.

This puts the damper on any dreams I may have about re-living the glory of my youth by getting back to hacking on that codebase.

UPDATE: Not sure why Levon decided to post his reply to his personal blog instead the one at Sun. Surely the other one is more relevant?

April 23, 2008 06:12 PM

Matthew Garrett:



Note that by "Enable Touchpad" it also means "Make my touchpad mouse buttons work". For machines that are only equipped with a touchpad (and, indeed, some that are equipped with a trackpoint as well - it depends on how it's wired up), the only option is to figure out that the space bar or enter will save you.

(Ubuntu-specific change, introduced in 8.04)

April 23, 2008 04:31 PM

Matthew Garrett:

Oh, and for the love of christ. Australian cultural norms are the correct standard for determining what's acceptable for posting on a site that has clear guidelines against offensive content? Baby Jesus cries with as much pain as if he'd been suddenly violated in an utterly inappropriate way.

April 23, 2008 12:39 AM

Matthew Garrett:

...

Leaving aside the obvious failure (asking women to classify themselves into those who are happy to have their breasts groped and those who aren't - and yes, that is the effective choice, since refusing to participate will result in people making conclusions about which of those categories you fit into), there's the whole "open-source breast" thing.

Creating copyrightable works is a choice. Having breasts is not[1]. Attempting to equate the two is not reasonable. Applying "open-source" to things that are not directly comparable to copyrightable works is like applying "Darwinism" to things that are not directly comparable to natural selection operating upon living things. Discussing Darwinistic politics is likely to result in people concluding that you're either a libertarian fuckloon or blame atheists for the death of your cat. Discussing open-source breasts is likely to result in me stabbing you in the fucking eye. Don't make the baby Jesus cry when he sees your horribly mutilated face. Just accept that open source is a useful licensing model, not an entire way of life.

Or: get your hands off my adjective or you'll find my hands on things you really don't want groping, whether you're wearing a badge or not.

(And yes, I realise that this is not the offensive aspect of things. This says it better than I could)

(Previously)

(Previously previously)

[1] There's a set of basically uninteresting exceptions. And by "uninteresting" I mean "If you are trying to argue over this, then you are missing the fucking point"

April 23, 2008 12:12 AM

April 22, 2008

Evgeniy Polyakov: Debunked copy_to_user() from kernel thread problem.

It happend to be really trivial. Even no VM hacking :(

First, some background on how copy_to_user() works on x86.
Its asm looks pretty simple (and it is very small, check arch/x86/lib/usercopy_32.c:__copy_user()), so I always wondered how it can handle missing-page-exception, when userspace page was swapped out.

Things live in small part of the function: .section __ex_table, this table contains two values: place where exception happend, and fixup address (it is just instruction positions). Linker puts this table into special section, accessible by page fault handler do_page_fault(). In some cases page fault path is never executed, code just searches for page and locks it, even if it is already in the table (that is why get_user_pages() is at best as fast as copy_to_user()). This happens when WP bit is not set and does not work (a speculation only though, derived from __copy_to_user_ll() and Intel F00F bug errata).

When WP bit works, we have usual copy_to_user(), which will fault if there is no destination page, and do_page_fault() eventually will be called. After number of checks system determines that it is exception in kernel mode and if there is above exception table (which is true for copy_to_user()), it tries to fix things up.

Here we come to essentially the same code, what is called in get_user_pages(): we locate VMA for failed address and insert new page into page table, this involves allocation of all those strange 3-letters abbreviations: pgd, pud, pmd and pte ('and' is not VMM abbreviation yet), I know what two or three of them mean, but completely forgot pud, on 4 level page table it is hard to recall which two are the same, since iirc x86 has only 3 levels.
If page was swapped out, it will be brought back and eventually fault handler will try to fix things up via fixup_exception(), which will replace EIP with appropriate value from the section table described above, so that CPU will return back to __copy_user() code and continue (or not, depending on fact that page exists or not) its execution.

So, how to hook into above mechanism and allow completely different process to write data into userspace? Quite trivially: above fixup (VMA searching and 3-letters abbreviation allocations) happens for particular mm_struct, which contains VMA list, page table lock and other (likely very) essential information to handle memory management. This structure is obtained from the curent thread executed on the CPU, so by replacing mm_struct in our kernel thread with userspace thread's one, we can safely copy data to and from userspace. There is a race of course, when userspace thread will want to access its own mm_struct (copied to kernel thread) for example calling mmap() or copy_*_user() from kernel, so we have to be careful and properly guard against that.

Example code which does copy to userspace from kernel thread can be found in archive. Just replace kernel path in Makefile to your own, call make and insert module.
Each reading from /dev/tcopy file will end up with copy of data from kernel to userspace in dedicated kernel thread. Comments (2)

April 22, 2008 01:00 PM