How Google Book Search transformed from impossible to inevitable

English: Google Digitization signs are all ove...

English: Google Digitization signs are all over the Michigan engineering library. (Photo credit: Wikipedia)

In a widely reported copyright fair use decision, Judge Denny Chin ruled that the Google Books program constituted fair use, denying claims of the Authors Guild that the scanning of 20 million library books and posting snippets of those works online infringed the rights of authors.

The litigation history reflects the transformation that has taken place on the internet in the past decade. In 2004 Google entered into an agreement with several universities, beginning with University of Michigan.

Google began the process of digitizing books at the nation’s great libraries, starting at the University of Michigan, the alma mater of company co-founder Larry Page. “Even before we started Google, we dreamed of making the incredible breadth of information that librarians so lovingly organize searchable online,” said Page. A 2005 lawsuit resulted in three years of negotiation and a proposed settlement in 2008. That settlement collapsed among antitrust concerns and fairness of the representatives of the plaintiffs’ sub-classes.

As the Google Books program evolved, two discrete projects operated. In the Partner Program “works are displayed with the permission of the rights holder.” The rights holders had the ability to opt out of the scanning, but in 2011 the Association of American Publishers settled with Google. According to the decision, “As of early 2012, the Partner Program included approximately 2.5 million books, with the consent of some 45,000 rights holders.” The participation suggests an industry voting with its feet.

Under the publisher agreement, Google stopped displaying ads with the publisher’s books. In turn, the publishers provide Google with the books. This settlement, even more than the two district court decisions, effectively ended the dispute – leaving the two lawsuits as mop-up activities.

In the HathiTrust litigation, Judge Harold Baer determined Google’s Library Project partners who comprised the HathiTrust partnership were entitled to fair use protection for the digitization of the 20,000,000 volumes copied and used by the libraries. The decision highlighted the benefits to visually-impaired students and researchers who had access to content not previously available through audio readers or braille, the benefits of digital search functionality, and the importance of protecting the library collections from physical harm and erosion.

In both opinions, the courts highlighted the new research opportunities created by the digital database:

Mass digitization allows new areas of non-expressive computational and statistical research, often called “textmining.” One example of text mining is research that compares the frequency with which authors used “is” to refer to the United States rather than “are” over time. Quoting the brief of the Digital Humanities amicus, “it was only in the latter half of the Nineteenth Century that the conception of the United States as a single, indivisible entity was reflected in the way a majority of writers referred to the nation.”).

The Google decision followed the same path, highlighting the benefits of digital search, the limits placed on commercial exploitation by Google, and the pro-market effects agreed to by the publishers. “Google Books expands access to books.” With this simple sentence, the court highlights the essence of the eight years of litigation. In looking at the transformative nature of the fair use test, the court explained, “Google Books does not supersede or supplant books because it is not a tool to be used to read books.”

The court does not discuss the tremendous value the Google Books program benefits the search engine, speech recognition and other algorithms operated by Google. It also dismisses the intermediary copying as a necessary function to enable the research and archival function to be exploited. But it does highlight that Google “does not run ads on the About the Book pages that contain snippets” and that Google “does not engage in the direct commercialization of copyrighted works.”

Google’s settlements and decisions not to commercialize the Google Books program likely tipped the scales with the publishers and may have strongly influenced the courts. Unlike Judge Baer, Judge Chin does not even discuss the potential to license the digitized database to Google. Baer rejected the potential to license the database as speculative. Moreover, since new works are added by voluntary participation with the publishers, the licenses for new works are included.

The decision appears a simplistic fair use summary that could lead casual observers to wonder why it required eight years of litigation. But changes to the conduct of both parties are what really led to this simple decision. Google adapted its behavior to limit its commercialization of the works. Publishers shifted their position from one of demanding opt-in, ex ante control to recognizing that the opt-out partnership met their needs. Eight years of experience did not produce significant evidence of authors being harmed as a result of snippet-searches replacing library purchases of academic texts.

In addition, the role of digital texts has changed. The Amazon Kindle and Apple iPad have paved the way for a fundamental shift in the relationship authors have with electronic texts. Market forces proved Google correctly anticipated a highly reconstructed book industry. Google was only one of the players bringing about this change.

Both the HathiTrust litigation and the Authors Guild v. Google litigation will likely be appealed, but there is little appeal in undoing the transformations to publishing that the Google Books program began.

Comprehensive Copyright Review – The First Steps of a Very Long Journey

House Judiciary Committee Chairman Bob Goodlatte has announced that the Judiciary Committee will conduct a comprehensive review of U.S. copyright law over the coming months. The comprehensive review is not any particular legislative agenda, but it will serve as an open invitation to content industries, technology industries, and the public in a way that likely never occurred in any of the Copyright Act’s prior legislative reforms.

Chairman Goodlatte emphasized the evolution of technology and media in his remarks:

The discussions during the early 1900’s over the need to update American copyright laws to respond to new technology were not the first time such discussions occurred and they will certainly not be the last. Formats such as photographs, sound recordings, and software along with ways to access such formats including radio, television, and the Internet did not exist when the Constitution recognized intellectual property. My Committee has repeatedly held similar discussions about new forms of intellectual property as they arose and enacted laws as appropriate. Driven by new technologies and business models, a number of changes to copyright law went into effect in 1976.

copyright officeNo one should expect immediate legislation. As Register of Copyrights, Maria Pallante noted in her recent congressional testimony “a major portion of the current copyright statute was enacted in 1976. It took over two decades to negotiate, and was drafted to address analog issues and to bring the United States into better harmony with international standards, namely the Berne Convention.” Even there, the effective date for U.S. adherence to the Berne Convention took until March 1, 1989.

In the decades of negotiation over copyright reform in the past, the tension was primarily between commercial interests of the content industries – film, television, music, and publishing industries with the trade unions, authors, and creative interests. But that focus has shifted dramatically with the rise of the information age.

The defeat of SOPA highlighted the tension between the technology industries – led by the ISPs, Google, Apple, Microsoft, eBay, Facebook, and Wikipedia with the content industries. In this fight, the content industries continue to lose. They could not push ACTA and they have lost in the courts over first sale in Kirtsaeng v. John Wiley & Sons, secondary liability in Viacom Int’l v. YouTube Inc. and Tiffany v. eBay, Inc., and many others.

Even more importantly, the rise of social media and the role copyright now plays – or interferes – in the daily lives of ordinary citizens means that the public’s interest in this debate will be higher than ever. Organized by social media companies like Facebook, LinkedIn, Twitter, Google and hundreds of others, the public will be exhorted to be heard every time they log on or check in. This is a great change for democracy. But we shouldn’t forget that those intermediaries are also the very technology companies that have their own stake in the outcomes.

Register Pallante has indicated some of the critical issues before the Judiciary Committee (though the explanation and approach is mine, not Register Pallente’s):

  • First sale doctrine – which could include both (i) a review of Kirtsaeng (2013) which internationalized first sale, and (ii) technologies that allow for a digital forward-and-delete that mimics first sale in the online environment;
  • Orphan works – questions about how to handle works for which the ownership information or the transfers of ownership have been lost;
  • Library exceptions – addressing digital collections and the ability to gain far greater usage out of far fewer copies;
  • Statutory licensing reform – on rate setting and rates;
  • Federalization of pre-72 sound recordings – resolving the issues involving retroactive pseudo-copyright protection for these works and the implications on the public domain;
  • Resale royalties for visual artists – addressing the conflict with those states which provide these rights and potentially creating national legislation;
  • Copyright small claims procedure or courts – adding a mechanism for copyright to be enforceable for small scale claims; and
  • Mass digitization of books – addressing the myriad of problems triggered by the intermediate copyright violations of works, the fair use of showing snippets, the procedural issues in the project, and many other concerns.

This list does not include many other potential areas for reform, including some of my preferred topics:

  • Explicit free speech and human rights accommodations for the statute, since copyright and First Amendment issues increasingly intersect;
  • Expanded fair use or copyright exemptions codified under Section 110 for digitization, reverse engineering, comparative advertising, and others;
  • Anti-circumvention (DMCA) reform to prohibit its use for use in commercial products – such as cars, printers, garage doors, and other goods;
  • Expanded registration requirements so that most of the economically insignificant works people create daily are outside of the copyright regime;
  • Statutory Damage Reform to tie statutory damages more closely to actual damages and separate commercial infringers from consumers;
  • Mandatory cease-and-desist system so that no one can be sued for copyright damages unless they have been notified directly the conduct is infringing and continue, after a reasonable opportunity to cure has been provided; and
  • Broader non-commercial exceptions to copyright analogous to the public/private distinction of the 1909 Act.

Copyright needs to continue to adjust to address these issues. While the system is not broken, there are many strains. Again, from Chairman Goodlatte:

There is little doubt that our copyright system faces new challenges today. The Internet has enabled copyright owners to make available their works to consumers around the world, but has also enabled others to do so without any compensation for copyright owners. Efforts to digitize our history so that all have access to it face questions about copyright ownership by those who are hard, if not impossible, to locate. There are concerns about statutory license and damage mechanisms. Federal judges are forced to make decisions using laws that are difficult to apply today. Even the Copyright Office itself faces challenges in meeting the growing needs of its customers – the American public.

It will be important to be heard on these issues and to think carefully about a system that is good for today’s issues, tomorrow’s challenges and the decades of unanticipated changes the new law will cover.

Beyond Google’s Looking Glass – The Internet of Things is Already Here

Seal of the United States Federal Trade Commis...

(photo: Wikipedia)

Perhaps triggered by the New York Times coverage of Google Glass, The FTC announced both a call for submissions and a workshop related to the Internet of Things and its implications on privacy, fair trade practice, and security implications for both data and people. The FTC announcement highlights both the benefits and risks of device connectivity.

Connected devices can communicate with consumers, transmit data back to companies, and compile data for third parties such as researchers, healthcare providers, or even other consumers, who can measure how their product usage compares with that of their neighbors.  The devices can provide important benefits to consumers:  they can handle tasks on a consumer’s behalf, improve efficiency, and enable consumers to control elements of their home or work environment from a distance. At the same time, the data collection and sharing that smart devices and greater connectivity enable, pose privacy and security risks.

The issue is not new. The ITU released a 2005 study discussing the implications of the Internet of Things. The ITU described a near, technological future in which “industrial products and everyday objects will take on smart characteristics and capabilities. … Such developments will turn the merely static objects of today into newly dynamic things, embedding intelligence in our environment, and stimulating the creation of innovative products and entirely new services.”

I have previously described some of these concerns in an article, Mortgaging the Meme.[1]

In each of these situations, an automated and consumer-defined relationship will replace the pre-existing activities. In many situations, this will create efficiency and convenience for the consumer, but it will also reduce the opportunities for human interaction and subtly rewrite the engagement between customer and company. Those that understand this change will adjust their technologies to improve the service and increase the customer‘s reliance on its systems. Companies that do not understand how this engagement will occur, risk alienating customers and losing markets quickly.

Beyond consumer interactions, other uses may arise. Ethical and privacy concerns regarding misuse tend to focus on government, business and organized crime. These include unwarranted surveillance, profiling, behavioral advertising and target pricing campaigns. As a result, as companies increasingly rely on these tools, they also bear a responsibility to do so in a socially positive manner that increases the public‘s estimation of the company.

Timing for the FTC submissions and workshop are overdue. Reading the New York Times quote regarding app developers, there is a sense that unlike the technology giants such as Microsoft and Google, the developers are thinking more about the technology’s potential than its potential impact. One such example from the Times: “‘You don’t carry your laptop in the bathroom, but with Glass, you’re wearing it,’ said Chad Sahlhoff, a freelance software developer in San Francisco. ‘That’s a funny issue we haven’t dealt with as software developers.’”

Many fields will benefit from increased device connectivity. Just a few:

  • Public transportation systems designed around real-time usage and traffic patterns.
  • Prescription monitoring to help patients take the right medications at the correct time.
  • Fresher, healthier produce.
  • Protection of pets and children.
  • Social connectivity, with photo-tagging and group-meeting moving into the real world.
  • Interactive games played on a real-world landscape.

There are also law enforcement uses that must be carefully considered. After the Boston Marathon attack, for example, calls for public surveillance will undoubtedly increase, including calls for adding seismic devices and real-time echo-location. Gunshots, explosions, and even loud arguments could become self-reporting.

Common household products sometimes become deadly in large quantities. RFID technology could be used to monitor quantity concentration of potentially lethal materials and provide that data to the authorities.

The consumer use, public use, and law enforcement use must be thoughtfully reviewed to balance the benefits of the technology with the intrusions into privacy and the legacy of retrievable information that such technology creates.

FTC staff will accept submissions through June 1, 2013, electronically through iot@ftc.gov or in written form. The workshop will be held on November 21st. These are the questions posed by the FTC thus far:

  • What are the significant developments in services and products that make use of this connectivity (including prevalence and predictions)?
  • What are the various technologies that enable this connectivity (e.g., RFID, barcodes, wired and wireless connections)?
  • What types of companies make up the smart ecosystem?
  • What are the current and future uses of smart technology?
  • How can consumers benefit from the technology?
  • What are the unique privacy and security concerns associated with smart technology and its data?  For example, how can companies implement security patching for smart devices?  What steps can be taken to prevent smart devices from becoming targets of or vectors for malware or adware?
  • How should privacy risks be weighed against potential societal benefits, such as the ability to generate better data to improve healthcare decision making or to promote energy efficiency?
  • Can and should de-identified data from smart devices be used for these purposes, and if so, under what circumstances?

While the FTC has asked some good questions, they are only the beginning. Please submit your thoughts and join the FTC conversation.


[1] Jon M. Garon, Mortgaging the Meme: Financing and Managing Disruptive Innovation, 10 NW. J. TECH. & INTELL. PROP. 441 (2012).

Journalism audience, revenue, and depth all in decline suggests Pew Center study

The Pew Research Center’s Project for Excellence in Journalism paints a dismal picture regarding the transformation of American news media. The Center describes “a news industry that is more undermanned and unprepared to uncover stories, dig deep into emerging ones or to question information put into its hands” than any time in recent history. Among the findings:

  • Sports, weather and traffic now account on average for 40% of content
  • Newsroom cutbacks in 2012 put the industry down 30% since its peak in 2000
  • Some media outlets, such as Forbes magazine, use technology by a company called Narrative Science to produce content by way of algorithm
  • Media campaign reports were primarily megaphones, rather than investigative journalism

In response to the declines, the Center reports, “nearly a third of U.S. adults, 31%, have stopped turning to a news outlet because it no longer provided them with the news they were accustomed to getting.”

Pew InfographicThere is some financial restructuring of the industry as well. In most cases, however, the restructuring moves revenue away from news media and towards aggregators such as Google and Facebook. Economically this is another situation where the company providing the conduit for content receives the revenue rather than the individuals and companies providing the actual content. The other good news is the slight increase in Sunday newspaper subscriptions and the end to the decline in overall newspaper sales.

In total, however, the report makes clear that while there is more information than ever before, there is less in-depth news coverage.

In a report published last year, the Pew Center reported found that “for every $1 newspapers were gaining in digital ad revenue, they were losing $7 in print advertising” and the gap grew to $16 in print losses for every digital dollar gained by the end of the 2012. Some papers are returning to pay walls to offset the losses; others are accelerating their cost-cutting in print and reporting expenses to pay the gap.

While digital revenues continue to grow, the income is not fueling journalism. Instead, it pays for mobile devices, social media and search. While each of these has benefits, journalism has a uniquely important role in society – unfortunately that role will continue to shrink as budgets wane, reports become more superficial, audiences erode, budgets shrink in response – and the cycle goes inexorably downward.

iPad Newsstand provides some revenue to the publishers, but at a steep price to the Apple newsstand vendor. Zinio and Kindle are also out there.

Perhaps it is time to rethink what we pay for with our home entertainment dollars. Maybe the bundle of services will cover a few dozen fewer unwatched cable channels and put a few cents into the digital edition of the local paper. Certainly it is time to rethink media ownership and financing rules for the digital market.

Google, EPIC and the Values of Disaggregation

On January 24, 2012 Google announced the significant revamping of its privacy policies, consolidating the policies to a single, comprehensive approach. The simplicity of the policy is undoubtedly good news for the legions of users who pay attention to such issues.

The bad news is that not only are the policies consolidated, so is the underlying data. Google, which runs dozens of discrete services will be integrating the data collection into a single, comprehensive and interconnected data set – the ultimate Big Data of consumer behavior. The reason is simple: Google’s revenue is tied exclusively to advertising. The better the data integration, the more valuable each ad is when served to a prospective customer.

Google has thus far ignored EU requests to delay the roll-out. Instead, Google insists that any delay would cause more confusion. Instead, in a lengthy response,  Google explained reason behind the merging of several policies into one. Among the features, better integration will support is integrated usage.

Our ability to share information for one account across services also allows signed-in users to use Google+’s sharing feature –called “circles”– to send directions to family and friends without leaving Google Maps. And a signed-in user can use her Gmail address book to auto­complete an email address when she’s inviting someone to work on a Google Docs document.

The answers, however, do not respond to the duties of Google under last year’s FTC consent decree regarding the ill-fated launch of Google Buzz. the Electronic Privacy Information Center (EPIC) has sued to force the FTC to enforce the consent decree and thwart the new privacy policy.  The District Court is scheduled to hear the case in advance of the March 1, 2012 launch date.

According to EPIC’s Complaint, Google has violated a number of its consent decree obligations:

  • Misrepresenting the extent to which Google maintains privacy and confidentiality.
  • Misrepresenting the extent to which Google complies  with the U.S.-EU Safe Harbor Framework and data security obligations.
  • Providing adequate notice and consent to changes in Google privacy policies.

The critical commentary has been mixed, with some finding the changes a tempest in a teapot while other analysts expressing greater concern.

Perhaps most troubling is Google’s own set of comments:

So, here’s the real story:

  • You still have choice and control. You don’t need to log in to use many of our services, including Search, Maps and YouTube. If you are logged in, you can still edit or turn off your Search history, switch Gmail chat to “off the record,” control the way Google tailors ads to your interests, use Incognito mode on Chrome, or use any of the other privacy tools we offer.
  • We’re not collecting more data about you. Our new policy simply makes it clear that we use data to refine and improve your experience on Google — whichever products or services you use. This is something we have already been doing for a long time.
  • We’re making things simpler and we’re trying to be upfront about it. Period.
  • You can use as much or as little of Google as you want. For example, you can have a Google Account and choose to use Gmail, but not use Google+. Or you could keep your data separate with different accounts — for example, one for YouTube and another for Gmail.

Privacy on Google requires turning its services off, operating in Incognito mode, creating multiple accounts or avoiding the products.  Google is increasingly transparent. It does not wish to provide privacy and will make private transactions increasingly difficult. That the data has always been collected is true; that Google can exploit it more effectively is perhaps the real story – and real danger.

The approach is the opposite of the FTC privacy-by-design imitative. Almost everything one does will be tracked. Suddenly search has become quite expensive.