Note for Rick Moen's SVLUG talk of Wednesday, Feb. 2, 2011 on The Wild, Wild Web: Web Browser Security, Performance, and Privacy (Random network humour: An IPv4 address space walks into a bar, and says, 'Bartender, a strong CIDR, please. I'm exhausted.') Slide 2 Some of us remember the early Web, some more than others. A few here might even remember when Gopher was a competing service. Anyone? Christmas 1990s is famously when Tim Berners-Lee at CERN created the first Web server (CERN httpd) and browser (WorldWideWeb), which then had a slow start for several years. The very first Web pages are still around, archived at the W3 Consortium: http://www.w3.org/History/19921103-hypertext/hypertext/WWW/TheProject.html Nice if you had a NeXT box, but Tim's browser was in Objective-C for NeXTStep, and didn't run on anything else. In 1991, Nicola Pellow wrote Line Mode Browser, a text browser vaguely like the modern-day Lynx browser that ran on anything -- and you could telnet into it at info.cern.ch. Line Mode Browser was abstracted into the libwww library, which was then the backbone of an _e-mail_ Web browser called Agora. Yes, Web browsing by sending mail and getting mail back. Wow. That wasn't very impressive, so two graphical browser followed in 1992: Erwise and ViolaWWW, which finally were semi-serious. Lynx followed shortly after -- and is still successful. Where the Web took off was with Marc Andreessen's Mosaic browser in 1993, developed at the NCSA in U. of Illinois at Urbana-Champlain. Slide 3 The first two conferences about the Web was held in 1994 at CERN and then MIT, which established that the Web would be royalty-free, which was the death-knell for Gopher, which wasn't. However, it also was becoming clear that Berners-Lee's W3C wasn't really in charge; that the Web was booming out of anyone's control. By 1996, having a Web site for just about anything was de rigeur. Slide 4 Microsoft Corporation got a slow start on the Web via the Cello Web browser (1993), then Mosaic for Windows. Mosaic was free for non-commercial use, the sort of 'academic' license then still popular at universities other than UC Berkeley and MIT, where the school administration planned to popularise the software via personal use and then sell commercial-use licenses for businesses. In 1994, Andreessen was graduated and co-created Mosaic Communications Corporation to commercialise Mosaic, issuing a beefed-up Mosaic that was initially called Mozilla. U. of Illinois complained about trademark violation, and the firm and its product were promptly renamed to Netscape -- but the release README files continued to say 'Remember, it's spelled N-e-t-s-c-a-p-e, but it's pronounced Mozilla.' Slide 5 And this is where things really started going sideways. Companies like DoubleClick (founded 1995 as 'Internet Advertising Network') convinced corporate Web sites to include links to third-party 'Web bug' 1x1 pixel images on their Web pages, which then caused the user's browser to revise an HTTP cookie. Standards bodies said, correctly, that this was an abuse of the Web -- and were ignored. Some browsers started including a feature to reject requests to read or write third-party cookies -- cookies pertaining to a site other than the one being browser -- and retain that feature to this day. Also, users started using utilities like Cookie Cutter to whack them away. It's important to note that HTTP cookies are perfectly legitimate, being just a place to store modest amounts of browser connection or user state. We'll get into that. Slide 6 The early 1990s were not only the time the Web was taking off, but also when everyone and his brother was cross-connecting businesses' LANs to the Internet. We network consultants got a lot of revenue out of that. Part of the fallout of that was that network security largely went somewhat primitive at the same time. Few of the newer cross-connects were cautious enough to use application proxy gateways, even quite standard ones like SOCKS gateways or Marcus Ranum's Trusted Information Systems Firewall Toolkit. Instead, they started deploying simple routers with IP address/port filtering, which to this day has become the default meaning of 'firewall' to most people. The Web had become so popular, in such a hurry, that it became universal among those IP/port filters to consider any connections to the Web's assigned port, TCP port 80 (and 443 for https, and 8080 as http's alternate port) to be 'safe' and allowed through with little or no scrutiny. In consequence, port 80 is a prime source of badness -- since it's treated as sacrosanct. Many of the extremely paranoid firms where I've done work attempt to block their employees' access to damned near all useful outside Internet services, which initially created problems for me when I needed to get things from my Internet server via ssh and scp -- until it occurred to me to add port 8080 to the list of ports my ssh daemon answers on. Since then, no problems. Slide 7 The HTTP protocol is said to be 'stateless', in the sense that the server sends the client stuff, and then the connection and session are over and done with. There's no _built-in_ persistent memory, at either server or client end, of an HTTP transaction after it's completed. But this didn't meet the needs of Web designers, so it was pretty much immediately engineered around. Here I list the three basic ways that state is managed with this 'stateless' protocol. If you login to a Web site, say, that is state and must be associated with you and your browser session. Slide 8. Storing state is good. Abusing state, less good. There's an entire industry built around the latter. All of those companies, and all of the companies that hire them to collate information about you and your Internet activity, have 'privacy policies' where they disclaim any interest in you personally. The pioneering firm, DoubleClick, was deemed tasty enough by 2008 that Google, Inc. bought it for $3.1 B. Slide 9. For an industry that claims to have utterly no interest in your personal information, the Web-metrics and Internet advertising industry sure puts a lot of effort into making sure that they are able to keep continuous tabs on you. People who've studied their methods have found them using some very clever ways to defeat users' efforts to lop off tracking data left in browser storage. In 2010, Samy Kankar created a demonstration project showing how all known tracking methods can be used to make such data persist, even in the face of determined efforts to purge it -- called the Evercookie -- which is a thing of beauty. You delete its HTTP cookie; it bounced right back, recreating that from the Flash cookie. Delete the Flash cookie, and it gets immediately recreated from data stored in your Web history. And so on. Slide 10. I can't prove it, but through pragmatic experimentation, it's seemed to me that Javascript abuse has in recent years been _the_ keystone technology used in several bits of badness on the Web -- spying on users, publication of malware and zombification of Windows workstations, etc. This abuse is possible because Javascript implementations tend to be way, way overfeatured and to involve very complex, brittle, crash-prone code, and because browsers are very trusting about what Javascript they're willing to run -- by default, any. Slide 11. Why care? The earliest reason I started caring was browser performance and stability. The ex-DTP-specialist webmasters in the 1990s put ridiculous rubbish into HTML as if they were doing page layout, and actually tended to get upset if you told them you'd configured your browser to ignore / filter out any of it, e.g., use your own fonts, use a local stylesheet, etc. -- which made me wonder 'Whose browser is it, anyway?' By the middle 1990s, Linux Web browsing using Netscape Communicator was a bit painful, largely because Communicator segfaulted at the drop of a hat. The major cause was Microsoft Front Page, the worst offender at abusing HTML tables for layout, which if complex enough caused Communicator to die. The Mozilla Project (1998) was just in time to save that situation, but the Front Page misadventure raised the obvious question of whether Web browsing could be improved by filtering out rubbish and making the browser refuse to respond to requests against the user's interest. Browsers started gaining popup blockers and settings to refuse third-party-site cookies. Users patched their browsers to make animated GIFs run though their cycles only once and then stop. Advertising kept increasing, and the tracking firms moved away from using just HTTP cookies. Suddenly, major sites' pages started serving up Javascript snippets from eight or ten third-party sites you'd never heard of. (Look at www.time.com, for example.) Browser spyware made its debut, and usually seemed to centrally involve Javascript. At the same time, I'd already taken measures at a lower level in the network stack to curtail the madness: My DNS server ns1.linuxmafia.com is 'authoritative' for a few dozen obnoxious domains -- tracking and advertising, mostly -- that I thereby make vanish for myself and anyone at my house who's using my DNS. Just the sheer number of obnoxious domains suggested to me the scope of the larger problem. Slide 12. This slide is because otherwise I'd be getting questions from fans of all of those (plus Epiphany, Konqueror, and so on). The interesting case is Chromium. Good browser, that. Keep considering it, as it's here to stay. My remark about the extensions interface relates to, if memory serves, comments by the fellow behind Adblock Plus about how Chromium's interface limits what can be done compared to Firefox. Firefox, Iceweasel, essentially the same thing. The Debian wrangle that gave rise to the latter highlights an ongoing problem with all major browsers on Linux: release and maintenance. Browsers need to be maintainable by Linux distributions and re-released at will without having to clear that with someone else, and need to be modifiable to comply with distro policies. I mention Swiftweasel (and Swiftfox, which inspired it) to highlight one other issue with major browsers: It'd be nice if their code were compiled with better optimisation, and using standard techniques to protect against buffer overflow. Slide 13. Here are the six extensions I urge you to consider using with Firefox. The version of these slides I used for my talk included Customize Google, a Firefox extension that became unmaintained in 2008 and has subsequently been installable only with some manual hacking (which my slides described). Fortunately, after giving my talk, an SVLUG member let me know about OptimizeGoogle, a maintained fork. So, I've now updated the slides (this slide and #20) to reference the newer version of that extension. In any event, it's worth remembering the lessons of that episode: As I mention on the (revised) slide, if a Firefox extension goes dusty and unmaintained, look around for a maintained successor (forked version). Also it illustrates one of the limitations of the Firefox extensions interface, the fact that its application interface is somewhat changeable. Slide 14. This slide is, again, because otherwise I'd Get Questions. BetterPrivacy and Abine (TACO) have a lot of fans. Me, I think they're both a bit bloated. A few nice words about Abine: When they heard about the Beef Taco fork, one of their managers wrote to the fork maintainer to compliment him and offer help maintaining the cookies list, which they then made good on. Points for them! Slide 15. Whenever I talk about the Web and security, dumb ideas come up. The worst is the entire category of 'enumerating badness', to borrow Marcus Ranum's term -- blocklists of very particular sites and behaviours and code snippets that that are to be snubbed when encountered. That's a losing strategy because you're never done, and can never catch up, and can't even stay even with the development of badness -- and also because you're not even solving the right problem. Blocklists of malware signatures are a dumb idea because it's smarter to just not trust and run untrustworthy code from nowhere-in-particular in the first place. Blocklists of phishing sites are a dumb idea because it's smarter to just make sure your bank site is really your bank site in the first place. Particularly pernicious is the window-and-mouse concept of 'opening' files in a file browser, where the 'open' operation is a bit of concealed magic where you typically don't know in advance very clearly whether the file is to be viewed or executed, and with what associated software. This habit of mind was sloppy with non-networked computers, but tends towards disaster when applied to files from public networks. The key point is that files arriving from public networks need to be assumed dangerous. That includes HTML/Javascript, of course. Slide 16. Linux users need to be very careful about picking up system-component files and packages from any-old-where. There are good reasons why we have distro package maintainers: They vet security, quality, and conformance with distro content policies. Use them! And be ultra-cautious when you cannot. In particular, any browser extension has full access to the capabilities of the browser. If you aren't pretty sure you should trust it, you shouldn't. Also, don't assume that browser extensions are open source, let alone audited by the open source community, just because they're for Firefox. Most extensions are proprietary, many are utterly terrible, and some are doubtless by crooks wanting to cheat and manipulate you. In 2009, a so-called 'screensaver' was widely distributed on the community site gnome-look.org in .deb package format, until someone had a good look at it and noticed it trojaned target systems. People asked why the gnome-look.org admins permitted such uploads, and they rightly pointed out that gnome-look.org is 100% self-service with no promise of quality control. Not enough people asked themselves 'Wait, why should a screensaver file be in a distro package, and need root authority to install?' Slide 17. There was a comical 2009 squabble between makers of the two most important Firefox extensions for browser security -- but everyone who relied on distro packages rather than rushing to 'upstream' avoided all of that idiocy. There's a lesson in that, somewhere. Slide 18. NoScript is the single most effective measure available to cut the rubbish out of browsing in 2011. People sometimes give up on it the moment they encounter any site with, say, non-functional buttons driven by Javascript, but it's really easy to get past that and fix the few problem sites permanently. Slide 19. Adblock Plus removes a breathtakingly large number of ads from the Web. Suddenly, everything's a lot cleaner, and your browser isn't wheezing from RAM exhaustion and falling over several times a week. Stick to the EasyList subscription, unless curious about the others. Slide 20. This was the slide I primarily re-edited after giving my talk, to replace an entire section about how to kludge the now-unmaintained CustomizeGoogle extension into recent Firefox versions, with a simpler and happier one about OptimizeGoogle. Slide 21. User Agent Switcher is useful for about 3/4 of the sites that claim to not be compatible with your browser, which is generally because some Web weenie decided to shut the door in your face, not because your browser is truly incompatible. Slide 22. Objection is a very modest and now-unmaintained (but functional) extension to autoprune Flash cookies. Greg Yardley, who wrote it, went on to create Better Privacy, which he says was much better code, but unfortunately turned the latter over to a successor who made it proprietary. Hence, I continue to recommend either Objection or a simple shell script for the same purpose (next slide). Slide 23. This is a dirt-simple, primitive cronjob to whack away all Flash cookies every day (or as desired), while leaving files I think are harmless. It's worth mentioning that Adobe/Macromedia's recommended alternative is to use your browser with its Flash interpreter to visit a 'Flash Settings Manager' at their corporate site to manage your Flash cookies: http://www.macromedia.com/support/documentation/en/flashplayer/help/settings_manager.html Call me cynical, but this strikes me as a bit like hiring the fox to manage the henhouse. I'd rather manage Flash cookies without the help of the people who created the problem. Slide 24. Beef Taco (open-source fork of TACO = Targeted Advertising Cookie Opt-out). Since overly persistent HTTP cookies from overbearing firms' domains are the problem, how about pre-empting them with harmless cookies? That's what Beef Taco is. Slide 25. DOM Storage is a new-ish place where Firefox can store state. Nothing in mine, on account of other measures I've taken -- but I show how to query what's in yours. Slide 26. Sundry bits of browser-configuration mild paranoia, details of which and more suggestions you can find at http://linuxmafia.com/~rick/firefox.html . Slide 27. A laundry-list of various ways your browser and your network configuration can give out information about you to anyone-at-all. It's useful to consider, from time to time, what a thief can learn about you, your doings, your finances, etc., if he/she steals your laptop. This slide includes some of that. For example, suppose you are really careful about the password-caching feature in your browser, to the point that you carefully avoid trusting it to store passwords for the five or six most security-sensitive domains you deal with, such as, say, your employer's VPN and your bank. Wow, the thief is going to see a nice list of your most-vital domains, right there in your browser setup. That's very convenient! I include a few words of encouragement for anyone who might consider running locally a good recursive nameserver, such as Unbound, on one's home or office LAN rather than just taking the path of least resistance and relying on someone-I'm-not-sure-but-probably-my-ISP's nameservers, which despite best intentions are usually ongoing security disasters (starting with cache poisoning), performance bottlenecks, and places where all the DNS names you're interested in get logged in someone else's logfiles. Why do that when running Unbound is dead-simple?