Tag Archive for Google

Book Review: If You Knew Google like She Knows Google . . . .

The Genealogist ‘s Google Toolbox, by Lisa Louise Cooke (2011)

If we were all in junior high school, I doubt that anyone would hang the moniker “Geek Girl” on Lisa Louise Cooke.  She just seems so socially well-adjusted.  But there has to be a little bit of  geek  in anybody who could write such a clear and cogent guide to expert use of Google.

Although titled The Genealogist’s Google Toolbox, just about anyone who uses a computer would find it useful and fascinating.  The book exposes little-known tips about Google and shows how to better utilize some of the well-known aspects of this more-than-a-search engine.  She has a great idea about turning your IGoogle page into a genealogical “dashboard” and discusses the gadgets available with which to do that. The book begins with the basic Google search and goes right through to cover Gmail, Youtube, and Google Earth, among other things.  Novice and expert alike will find something to enjoy about this book.

Lisa also highlights some sources that that can sometimes be overlooked, such as Google Books.  I’ve long been a fan of Google Books as a source of historical background and sometimes specific individual genealogical information.  I was glad to see it included in the book.

Google’s capacities are probably far under-utilized by many genealogists. This book will excite you about Google’s many  services, and perhaps even  help you break down a brick wall, or two.

It’s available from Lulu.com in both electronic form and hard copy. I bought the hard copy book, the first time I’ve ever purchased anything from Lulu.com.  I found the process easy and the shipping was swift.

QwikTip: Finding Your Ne’er-Do-Well Ancestors and Relatives

Here’s  a quick way and inexpensive way  that you  might find out about your felonious ancestors and relatives:

1.  Go to Google Books, and type in “State  v. [name]” and see what comes up.  Try these variations:

  • For some states, type in “People versus [name]“
  • For Pennsylvania, Kentucky and Virginia, “Commonwealth v. [name]“
  • For British, old Canadian, and old Australian cases, type “R. versus [name]“
  • For  cases in U.S. federal courts, including military courts-martial,  “United States v.  [name]“

2. If the case is older than 1923, its appellate report likely may be found on Google Books without copyright restrictions.

3.  If the case is a newer case,  note the citation given and take it to your local county law library or other public library that has legal sources, and ask the law librarian to find it for you.  The copyright notice not withstanding, you are entitled to make at least one copy for your personal research.

Ladies and Gentlemen, start your search engines!

Around the Research Circle

I’m doing some hard genealogy that I’ll write about very soon. But I had an interesting experience this evening:  I was trying to track down some information about Marydale Plantation in Tensas Parish Louisiana.  I went first to Google and listed among the results was a bookmark on diigo.com for Marydale Plantation!   Placed there by me . . . I had forgotten it!  That’s why God made diigo and Google!

Research Resource: Magazines Join Google Books

Google announced earlier this month that they now have online an archive of various magazines’ content.   This is just the first iteration of an initiative that will bring online millions of page of magazine content.  The initiative is being carried out with the cooperation of publishers.

The magazines are searchable through Google Book Search and eventually will turn up in regular Google searches.

Those who research African-American history will be interested to know that Jet and Ebony are in the first group of magazines to be digitized and are online now.

This is the third major Google announcement this fall. The first was the addition of newspaper content; the second was the acquisition of the LIFE magazine photo-archives.

I’ve tried the magazine search and it is satisfactory.  From my viewpoint, however, I would like to see a mechanism to segregate magazine searches from book searches.

GeneaTechnology

What technology is indispensable to me as a genealogist and family historian? Well, don’t expect any big surprises here–my indispensable choices are rather pedestrian.

Hardware: My 2GB thumb drive is invaluable. It makes my data portable and it’s a decent backup.

Software: Okay, I know I’ll seem like a caveman, but for me it’s got to be PAF 5.2. I’ve also got the latest iteration of RootsMagic, but I started with PAF and it’s still like a companion you grow especially fond of.

Website: Although I subscribe to a number of commercial sites and regularly visit many excellent non-pay sites, there’s nothing like Google, with all its accessories like Google Books, Google Scholar, and so forth. I’ve broken through several brickwalls with Google’s help when the commercial sites yielded nothing! I’ve just fallen in love with Google StreetView!

Some Final Thoughts on "Did Ancestry Violate Copyright Law?"

I think an analysis of the statutory “fair use” factors can lead to the conclusion that Ancestry.com’s “Internet Biographical Collection” as it was initially set up, did not constitute a fair use of the copyrighted material collected and used.

I think that Ancestry’s IBC probably does not qualify for the system caching “safe harbor” for infringement in the Digital Millennium Copyright Act. Some of Ancestry’s statements about the IBC suggest a certain permanence, not the “intermediate and temporary” caching that the law protects.

I think that issues of “fair use” and DMCA “safe harbor” as they relate to search engine caching have not yet had a full examination by the courts. I think that reliance on the few decided cases leaves a great deal of uncertainty, no matter how much the industry puts an optimistic face on it. I think that the United States Supreme Court will eventually decide these issues. The law yet may turn out to be as the industry wants it, but that’s not where we are today.

The few decided cases on this matter seem to suggest that the copyright holder must take steps to keep the protected content from being collected by the various bots that roam the Internet. In Field v. Google, for example, the court held that Field’s failure to utilize a “no-archive” meta-tag was the basis on which the court could find that Field had given Google an “implied license” to use his copyrighted content. This holding either turns the law on its head or shows how different copyright law is from other law or is an example of judicial value imposition.

Here’s what I mean by the last sentence above. To say that a web publisher gives an implied licence to anyone who wants to take protected content if the publisher fails to use certain meta-tags or other technical means is akin to saying that one gives an implied license to a burglar if one’s door isn’t locked. That turns the law on its head. Or perhaps that’s just how different copyright law is.

Actually, I think it’s the third thing: judicially imposed values. By this I mean that judges have decided that there are salutary purposes served by the practices of companies like Google and that to enjoin them would constrain the economic growth of the Internet. This can be seen by the recitation by the courts of the “socially important purposes” served by Google, for example. [See Field v. Google, Inc., 412 FSupp2d 1106, 1119 (D.Nev. 2006)] In the nineteenth century, courts took a doctrinally similar approach to the development of law concerning railroads. Philosophically, I may agree with the notion that the law shouldn’t hinder the development of the Internet, but some of the questions about how the law operates with respect to the Internet, are for the Congress, and not judges to decide. That’s especially so when it appears that the judges got the intent of Congress wrong in the first place. [See the discussion of the DCMA in the post yesterday.]

===>So did Ancestry’s IBC violate copyright law? Well, lawyers are infamously cautious . . . . If I were advising Ancestry [which I am not] and sticking to a careful reading of the law, I would tell them to take the IBC back the drawing board, because I would not be comfortable with the infringement risk that they took. It looks to me like they collected and archived and made avail to third parties content owned by other publishers. I would tell them not to rely on the Field case, the Parker case, or the Kelly case. I might tell Google the same thing were I advising them [which I'm not]. Lindsay said in the comments the other day:

Google is very analogous to an ISP in that it is not in the content business. Ancestry on the other hand is taking content for the purpose of publishing it on their own site. One is a side effect of providing search services and has other uses, the other is appropriation for the purpose of publishing.

As to Google’s search engine, I think this is true.

====>Must content owners utilize meta-tags and “robot.txt” files to avoid giving an implied licence? [What follows is opinion, not legal advice] I know that an unlocked door is no defense to a burglar, but I still lock my doors. The problem is that the burden of using these technical tools is, in many cases, fairly minimal, whereas the courts believe the burden on the service provider like Google to communicate with each publisher is substantial, if not insurmountable. Now some content owners do not necessarily have access to the page source to insert such code,; they may have to ask their web-hosting services to help them out. If they won’t, get another host. Now I happen to believe that the Supreme Court may modify this emerging rule to some extent, but until they do, perhaps “safe rather than sorry” is a good idea.

====>Does Ancestry’s Terms and Conditions of Use protect it in this matter? I think not.

====>What is the significance of “notice” of the MyFamilyBot? At some point, Ancestry.com added the following page to its site:

The MyFamilyBot Information Page:

What is MyFamilyBot? Why is it accessing my files?:

MyFamily is creating an index based on a powerful person-based biographical ranking engine that gives superior results over searches done using the more general purpose internet search engines. Ancestry.com indexes the biographic text and provides a search service that points users back to the originating website.

MyFamilyBot is the name of a web crawler (a.k.a. robot, spider) used by MyFamily.com to find biographical text on the Internet in connection with this engine. The crawler works by deeply crawling sites that contain biographical text. We have constructed the bot to limit its affect on site usage to be within the range of that of the large commercial search engines. Sites that do not contain biographical text are examined in a superficial manner.

How do I prevent MyFamilyBot from crawling my site?
MyFamilyBot supports the Internet standard protocols for restricting spiders from crawling web sites. These protocols are described here:
http://www.robotstxt.org/wc/exclusion.html

How can I contact someone concerning MyFamilyBot?
Please send questions and concerns about MyFamilyBot to SearchBot@MyFamilyInc.com.

I have no idea when this page was added and I’m still not sure how to access it on the Ancestry.com site. (Ironically, I used Google’s cache to find it). Some may believe that this page constitutes some sort of notice to Web publishers who thereupon should have put into place the well-known protocols for preventing “MyFamilyBot” from crawling their site. I don’t agree with this for a variety of reasons. First, I’m concerned about the adequacy of the notice. Second, the reasons I gave above apply here. This would continue to turn property law upside down.

CONCLUSION: Ancestry did the right thing by removing the IBC. They were in a fog of legal uncertainty. And more importantly, the rather surreptitious manner in which they established the IBC breached faith with their membership and the rest of the genealogical community. The legal issues ultimately will be resolved the by the United States supreme Court. The ethical and social issues can only be worked out if Ancestry reaches out to the community and engages the community in a genuine effort to close the breach. They’ve got some work to do on that issue. Now’s the time to start.

Part 1
Part 2
Part 3
Part 4

Notice: The information in this writing is intended for educational use only and is not intended nor should it be construed as legal advice. If you have a legal problem, consult a lawyer admitted to practice in your state of residence. I am an active member of the bar of the State of California and am admitted to practice before the United States Supreme Court and various other federal courts. I am not licensed to practice in any other state. I am not presently soliciting or accepting new clients in the matters discussed above.

Did Ancestry Violate Copyright Law?. . . . It Depends. . . .Part 4 of 4

Here are some important observations before we go on:

(1) Ancestry’s IBC is operationally unlike Google’s search engine. “Fair use” and direct infringement cases are highly fact-specific.

(2) Whether Google’s search engine is or is not “fair use” has yet to be considered adequately
by a court because:

  1. The Field case involves unique facts (i.e., the plaintiff “set up” Google to get money from them.
  2. The Parker case relies to some extent on the Field case. The U.S. Third Circuit Court of Appeals affirmed the trial court in Parker, but ordered that its decision not be published. This means that it cannot be relied upon as precedent by other courts. It also may signal that the Court of Appeals does not have full confidence in the decision.

I do not agree that the Field case “clears away copyright questions that have troubled the entire search engine industry,” as an attorney with the Electronic Frontier Foundation said.

[For the record, I am not a Google-basher. I like Google. While we're at it, I generally like Ancestry.com, too, and I use it frequently. But I was angry to discover my content on their IBC.]

(3) It is possible that Ancestry.com, Google, or somebody else, could set up a genealogy-specific search engine that would “fairly use” both links and a cache of copyright-protected Web sites.

Now back to our series.

The Digital Millennium Copyright Act

In 1998, the Congress enacted the Digital Millennium Copyright Act (DCMA). There are a number of aspects to this statute, but here, the relevant matter is in Title II of the Act, which is known as the “Online Copyright Infringement Liability Limitation Act.” This part of the law creates limitations on the liability of online service providers for copyright infringement when engaging in certain types of activities. I discuss it here because the court in Field v. Google discussed it and several commenters have mentioned it.

Section 512(b) of the Act provides the so-called system caching “safe harbor” for online service providers. An online service provider is not liable for for infringement of copyright by reason of the “intermediate and temporary storage of material on a system or network controlled or operated by or for the service provider” in certain circumstances. Those circumstances are where:


(A) the material is made available online by a person other than the service provider;
(B) the material is transmitted from the person described in subparagraph (A) through the system or network to a person other than the person described in subparagraph (A) at the direction of that other person; and

(C) the storage is carried out through an automatic technical process for the purpose of making the material available to users of the system or network who, after the material is transmitted as described in subparagraph (B), request access to the material from the person described in subparagraph (A), if the conditions set forth in paragraph (2) are met.


Before we go onto paragraph (2), let’s examine the portions above. First, the cache must be “intermediate and temporary.” In Field v. Google cache , the court had evidence before it that Google stored material for 14 to 20 days. Relying on a case called Ellison v. Robertson, 357 F.3d 1072, 1081 (9th Cir. 2004), involving AOL, the Field court held that 14 to 20 days was “intermediate and temporary.”

We don’t know how long Ancestry planned to keep material in its cache. There are hints in the company’s statements that can be interpreted to suggest a temporary cache and hints that suggest a longer storage. I would suggest that 14-20 days is probably on the outer limits of the plain meaning of “intermediate and temporary.” Certainly, if Ancestry intended to keep material in its cache longer than that, they would not qualify for the infringement liability “safe harbor.”

Now here’s the analysis of the rest of paragraph (1):


(A) the material is made available online by a person other than the service provider;

This provision is met when the copyright holder posts his or her content online. The content owner is “a person other than the service provider.”


(B) the material is transmitted from the person described in subparagraph (A) through the system or network to a person other than the person described in subparagraph (A) at the direction of that other person;

This means that the copyrighted content is accessed by someone other than the copyright owner from the copyright owner. Note that this provision suggests a very temporary caching, because the caching takes place when the content is accessed by a user from the content provider’s site. In Field, the court got this wrong. The court described Google, the service provider, as the “other person.” If Congress had intended the service provider to be “the other person,” Congress would have said so.

The point here is that to take advantage of the safe harbor, Ancestry would have to cache the material temporarily as it was being transmitted between content provider and content user. That’s not how they described what they were doing. Further evidence of my point is in subparagraph (C):


(C) the storage is carried out through an automatic technical process for the purpose of making the material available to users of the system or network who, after the material is transmitted as described in subparagraph (B), request access to the material from the person described in subparagraph (A) . . . .


A leading copyright expert says about this provision:

Thus, the literal language of Section 512(b) appears not to cover “advance” caching, in which material is copied into a cache for anticipated requests for it, rather than upon the first actual request for it . . . .

David L. Hayes, Advanced Copyright Issues on the Internet (2007) [The link is to a 412-page document. The quote is on page 307.]

This interpretation is also borne out by the legislative history of the DCMA. House of Representatives Report No. 105-551, part2, page 52, includes the following:

For subsection (b) to apply, the material must be made available on an originating site, transmitted at the direction of another person through the system or network
operated by or for the service provider to a different person, and stored through an automatic technical process so that users of the system or network who subsequently request access to the material from the originating site may obtain access to the material from the system or network.

Ancestry was doing “advance” caching, which would not protect it from infringement claims under the DCMA.

Recall that the safe harbor also requires that “
the conditions set forth in paragraph (2) are met.” Having already found that Ancestry would not qualify for the DCMA safe harbor, we can assume that Ancestry would meet the other requirements without changing the result.

A careful reading of the DMCA leads to the conclusion that Ancestry’s IBC would not be safe from infringement claims under the Copyright Act.

Having spent a considerable amount of time on this, I need to take a day off. I meant this to be a four-part series, but so many good questions have been raised in the comments that I will answer (many of them, if not all) in one more post on Friday. Then I’ll get back to being a genealogist in this space.

COMING ON FRIDAY: Some Final Thoughts

Notice: The information in this writing is intended for educational use only and is not intended nor should it be construed as legal advice. If you have a legal problem, consult a lawyer admitted to practice in your state of residence. I am an active member of the bar of the State of California and am admitted to practice before the United States Supreme Court and various other federal courts. I am not licensed to practice in any other state. I am not presently soliciting or accepting new clients in the matters discussed above.

Did Ancestry Violate Copyright Law? . . . . Part 3of 4: Fair Use

We’ve explored the Field v. Google, Inc., case thus far and learned about the facts of that case and some of the holdings. A number of commenters have insisted (and still insist) that because the court found Google’s caching to be “fair use,” the same result would obtain with respect to Ancestry’s Internet Biographical Collection.

I do believe that the matter of “fair use” is the most important issue in the analysis. But, I’ve said here that these cases are highly fact-specific. So before we get to the fair use analysis, let’s take a look at some of the factual matters that various commenters have raised since we started this series.

Janice said:

I do have two comments. I’m not sure it changes things in a legal sense, but Ancestry also provided an option (to subscribers only, and even after IBC became “free”) to click and save the cached page to their “Shoebox”–a holding area of documents that subscribers are interested in.

Also, the initial Ancestry.com source description calls the IBC a “database-online,” not a search engine (I have a screen shot of that if you need it).

Jeff Scism said:

Ancestry through a spokesperson clearly stated what the intent was initially, “the websites have VALUE. And even if the site owners were to remove the contents, the pages would remain available through Ancestry.com”

That indicates to my simple mind that, since only paying customers had original access, that Ancestry had premeditated their intent to take and sell the content, despite what the site owners decided to do with their creations. Their obvious attempt to actually hide the source pages at first and only provide a sanitized copy of the data- removing source website info, and identifying graphics, and copyright notices, shows that the intent was to steal and sell the content.

Another issue not addressed is that Family Tree maker, a genealogy program they sell, still has this search built in, and provides the data directly for merging into your family file, sourcing it as Ancestry’s collection, and no direct reference to the authors.

And Ancestry said on August 28, 2007:

Ancestry.com just added the Internet Biographical Collection which is a compilation of genealogy information across the web.

I also recommend that you check out the comments on Dick Eastman’s blog here and here.

Now, on to “Fair Use.”

“Fair Use” is a limitation on the exclusive rights of a copyright owner. It’s contained in section 107 of the Copyright Act. [Title 17, United States Code]. The Act says that use “for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research is not an infringement of copyright .” The United States Supreme Court has stated that the statute “calls for case-by-case analysis.” [Campbell v. Acuff-Rose Music, Inc., 510 US 569 (1994)]. In that respect, a court must analyze at least four factors:

(1) the purpose and character of the use, including whether such use is of a
commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted
work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted
work.

Factor 1: Purpose and Character of the Use

In the case of Campbell v. Acuff-Rose Music, Inc., the Supreme Court said about this first factor of the “fair use” analysis:

The enquiry here may be guided by the examples given in the preamble to § 107, looking to whether the use is for criticism, or comment, or news reporting, and the like . . . . The central purpose of this investigation is to see . . . whether the new [use] merely “supersedes the objects” of the original creation . . . or instead adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message; it asks, in other words, whether and to what extent the new work is “transformative” . . . . [510 US at 578-579]


The Field court looked to a case decided by the U.S. Ninth Circuit Court of Appeals to understand how to apply this “transformative” rule. (As a trial court within the Ninth Circuit, which covers the Far West, the federal district court in Nevada is obligated to follow the precedent of the Ninth Circuit Court of Appeals).

The case which the court followed is called Kelly v. Arriba Soft Corp., 336 F3d 811 (2003). In that case, the appeals court held that the use of copyrighted images that were displayed on Internet web sites by an operator of a “visual search engine,” which displayed search results as “thumbnail” pictures, was “fair use” of copyrighted images.

But before we jump to any conclusions about Ancestry’s IBC, it’s important to understand why the court came to that conclusion in the Kelly case.

Arriba, now known as “Ditto.com,” developed a computer program that “crawls” the web looking for images to index. This crawler downloads full-sized copies of the images onto Arriba’s server. The program then uses these copies to generate smaller, lower-resolution thumbnails of the images. Once the thumbnails are created, the program deletes the full-sized originals from the server. Although a user could copy these thumbnails to a computer or disk, the user cannot increase the resolution of the thumbnail; any enlargement would result in a loss of clarity of the image.

The court found Arriba’s use of Kelly’s photographs to be “transformative,” that is, Arriba’s use “added a further purpose or different character” to the photographs. The court said:

Kelly’s images are artistic works intended to inform and to engage the viewer in an aesthetic experience. His images are used to portray scenes from the American West in an aesthetic manner. Arriba’s use of Kelly’s images in the thumbnails is unrelated to any aesthetic purpose. Arriba’s search engine functions as a tool to help index and improve access to images on the Internet and their related web sites. In fact, users are unlikely to enlarge the thumbnails and use them for artistic purposes because the thumbnails are of much lower-resolution than the originals; any enlargement results in a significant loss of clarity of the image, making them inappropriate as display material. [336 F3d at 818].

The court observed that Arriba’s thumbnails of Kelly’s photographs “are not used for illustrative or artistic purposes and therefore do not supplant the need for the originals.” [336 F3d at 820]

The question now is whether, using this reasoning, Ancestry’s use of others’ content in its IBC was “transformative.” I would note first that, unlike Arriba with respect to Kelly, Ancestry is in the same field of endeavor as are the copyright owners whose material was used in the IBC. Ancestry’s purposes seem to have been the same as that of the content owners. Ancestry initially described the IBC as a “collection,” not a search engine. It would thus appear that Ancestry simply intended to make available on its genealogical site, initially for a fee, the exact same content owned by others. Unlike the “thumbnails” of “much lower-resolution than the originals” in Kelly’s case, the entire content of any collected site was available in the IBC. In other words, Ancestry’s use merely superseded that of the content owners. This is not a transformative use.

It cannot be said that either Field or Kelly, either individually or taken together, endorsed as “fair use” anything denominated a “search engine.” Again, the determination of “fair use” is fact-specific. For example, Google’s search engine operates somewhat differently than did Arriba’s. Along these lines, it is not at all clear that the IBC operated, or was intended to operate, like Google’s or Arriba’s search engines. The term “search engine” in reference to the IBC did not appear until after the controversy erupted.

For another example, is Google Books merely a “search engine”? I think as it was originally conceived, it was more of an electronic library than a search engine. Hence, the concern by publishers. But, as it now operates, Google Books is a bit of both search engine and electronic library. With respect to certain items now beyond copyright protection (those for which the user gets “Full View”), the electronic library aspect is fully operational. With respect to those items given the most protection (those for which the reader is given only a “Snippet”), Google Books is almost entirely a search engine. In the full electronic library mode, Google’s use clearly supersedes that of the original content creator.

Ancestry’s IBC, as we originally understood it, was more like Google Books than like Google’s “simple” search engine.

The court also considered the commercial use of Kelly’s work:

While such use of Kelly’s images was commercial, it was more incidental and less exploitative in nature than more traditional types of commercial use. Arriba was neither using Kelly’s images to directly promote its web site nor trying to profit by selling Kelly’s images [336 F3d at 818]


These do not seem to be the facts of Ancestry’s use, especially when the content was behind their paid subscriber wall.

Factor 2: Nature of the Copyrighted Work

The second statutory factor, “the nature of the copyrighted work,” draws on the value of the materials used. Works that are creative in nature are closer to the core of intended copyright protection than are more fact-based works. [Kelly v. Arriba, 336 F3d at 820]. The court found Kelly’s work to be creative, as would be found as to the content collected by Ancestry for the IBC.

The court also said that the fact that a work is published or unpublished also is a critical element of its nature. Published works are more likely to qualify as fair use because the first appearance of the artist’s expression has already occurred. [Kelly v. Arriba, 336 F3d at 820] The court found that these two elements caused this factor to weigh in favor of Kelly, but only slightly. The same could be said of the content included in the IBC. All of it had appeared on the Internet before it appeared in the IBC.

Factor 3: Amount and substantiality of portion used.

In Kelly v. Arriba, the court said:

While wholesale copying does not preclude fair use per se, copying an entire work militates against a finding of fair use. However, the extent of permissible copying varies with the purpose and character of the use. If the secondary user only copies as much as is necessary for his or her intended use, then this factor will not weigh against him or her. [336 F3d at 820-821]

The IBC seems to have copied the entirety of the works collected for the IBC. But what is the extent of permissible copying in light of the purpose and character of use in this situation?

In Field v. Google, the court said:

. . . Google’s use of entire Web pages in its Cached links serves multiple transformative and socially valuable purposes. These purposes could not be effectively accomplished by using only portions of the Web pages. Without allowing access to the whole of a Web page, the Google Cached link cannot assist Web users (and content owners) by offering access to pages that are otherwise unavailable. Nor could use of less than the whole page assist in the archival or comparative purposes of Google’s “Cached” links. Finally, Google’s offering of highlighted search terms in cached copies of Web pages would not allow users to understand why a Web page was deemed germane if less than the whole Web page were provided. Because Google uses no more of the works than is necessary in allowing access to them through “Cached” links, the third fair use factor is neutral, despite the fact that Google allowed access to the entirety of Field’s works. [412 FSupp2d at 1121]

The Field court’s reasoning might apply to Ancestry’s IBC if the IBC had the same function and functionality as Google’s search engine. And it’s not clear that the IBC operated like Google’s search engine. Ancestry could have copied only a”snippet” of the collected IBC works and included a link to the original and still have accomplished a worthy purpose–much like Google Books does.

Factor 4: The Effect of the Use upon the Potential Market for or Value of the Copyrighted Work

The Kelly court said:

This last factor requires courts to consider “not only the extent of market harm caused by the particular actions of the alleged infringer, but also ‘whether unrestricted and widespread conduct of the sort engaged in by the
defendant … would result in a substantially adverse impact on the potential
market for the original.’ ” [336 F3d at 821, citing Campbell v. Acuff-Rose Music, Inc.]

There are several potential markets for the content collected in the IBC. Many of the content owners sell advertising space on their sites. Ancestry obviously found the content valuable enough to initially place it behind their paid subscriber wall; a content owner could license their content to some potential competitor of Ancestry’s, like World Vital Records, for example. It could be reasonably said that Ancestry’s actions could harm the potential market for the content owners’ products. Some portion of the potential audience for the content would find it first at Ancestry’s site and because of the way it initially was set up at the IBC, that audience would find no need to go to the original site. And this is particularly so if, as alleged by one of the commenters, Ancestry has linked the IBC to the Family Tree Maker software.

Based on the foregoing analysis of the cases, a court could find that Ancestry’s use was not a “fair use.”

Here are several points to keep in mind:

(1) What’s said above is based on the premise that Ancestry’s IBC is not like Google’s search engine. For example, as far as we know, the IBC did not include a statement on “cached” pages that the user is viewing only a cached page. (Later, of course, Ancestry added links to the original pages). I think the two things may be factually distinguishable.

(2) Field should not necessarily be relied upon, because it’s the classic example of bad facts making bad law. That is to say, to take a general rule out of that case is a bad idea because the facts are so unique and egregious.

(3) There are a lot of facts about the IBC that remain known only to Ancestry. Some of those facts may help them; some may hurt them.

(4) Nobody, not a lawyer, not the Copyright Office, nobody but a court can finally settle what is or is not “fair use.”

I’m going to move our discussion of the Digital Millennium Copyright Act to tomorrow’s post. In that post, I’ll also address some issues that have come up in the Comments and share some thoughts on copyright protection.

Part 3 of Legal Analysis Temporarily Delayed

It’ll be here later today. I have to add a few things and I got busy with my first priority–my students!

Did Ancestry Violate Copyright Law? . . . . Part 2 of 4

Before we get to the heart of the legal analysis, here are some additional facts which may be legally significant. They were provided in the Comments to yesterday’s post by Janice Brown of Cow Hampshire. Janice first called my attention to this issue in late August.

Ancestry also provided an option (to subscribers only, and even after IBC became “free”) to click and save the cached page to their “Shoebox”–a holding area of documents that subscribers are interested in.

Also, the initial Ancestry.com source description calls the IBC a “database-online,” not a search engine . . . .

Janice is correct about these additional facts and we will analyze their legal significance.

Janice also writes:

Also, there were several people who argued in commentary on various blogs and message boards that we, as bloggers and web sites owners, should have known that Ancestry would be doing this, due to various announcements and press releases they made, and the burden was on each of us to place a robots.txt file or some sort of HTML coding to prevent Ancestry.com from caching our sites. Is the burden truly on the blogger or web site owner, even if they are not commercial (i.e., the “mom and pop” web sites and blogs).

We’ll explore what the courts have to say about this issue as well. At the end of the series, I’ll have some suggestions for copyright owners.

I should point out to all readers that this remains an unsettled and evolving area of law; this ride may prove a bit frustrating at times. Now on with the show . . . .

Field v. Google, Inc., 412 FSupp2d 1106 (D.Nev. 2006) [the link is to a PDF version of the court's Order], is the case that was cited by most commentators and bloggers concerning the Ancestry IBC issue. They opined that the outcome of that case likely would dictate the rule of law applicable to the IBC issue. My preliminary reaction was that since Field is a decision of a trial court, the lowest level of the federal judiciary, no other court is obligated to follow it; and second, there are some unique facts in this case that may have had an influence on the outcome.

Blake Field is a lawyer in Nevada. He’s also a poet. Field was familiar with Google’s search and caching processes. With this knowledge, according to the court, “Field decided to manufacture a claim for copyright infringement against Google in the hopes of making money from Google’s standard practice.” [412 FSupp2d at 1113]. In January 2004, Field created fifty-one works and put them on a website, accessible for free. He also created a “robots.txt” file for his site because he wanted search engines to visit his site and include the site within their search results. The court notes that “Field knew that if he used the ‘no-archive’ meta-tag on the pages of his site, Google would not provide “Cached” links for the pages containing his works.” [412 FSupp2d at 1114] So, he consciously chose not to use the “no-archive” meta-tag on his Website.

As Field intended and expected, the “Googlebot” visited his site, and indexed and cached its pages. Thereafter, each of Field’s pages was retrieved from Google’s cache by some individual or individuals.

Field sued Google for copyright infringement. “Field allege[d] that Google directly infringed his copyrights when a Google user clicked on a “Cached” link to the Web pages containing Field’s copyrighted works and downloaded a copy of those pages from Google’s computers.” [412 FSupp2d at 1115; emphasis added] Field did not allege that Google infringed his copyrights when the Googlebot initially copied his pages and stored then in the system cache.

Following established legal precedent, the court pointed out that for copyright infringement, a plaintiff must show ownership by the plaintiff, and copying by the defendant. Furthermore, the copying must the result of a volitional act on the part of the defendant. [CoStar Group, Inc. v. LoopNet, Inc., 373 F.3d 544, 555 (4th Cir.2004)].

Applying the law to the facts, the court ruled in favor of Google. The court said, “[W]hen a user requests a Web page contained in the Google cache by clicking on a ‘Cached’ link, it is the user, not Google, who creates and downloads a copy of the cached Web page. Google is passive in this process.” [412 FSupp2d at 1115] In other words, the court found no volitional act on the part of Google when a user accesses its system cache.

There’s more to the Field case, certainly. And certainly, it doesn’t answer questions such as whether the user can be sued for copyright infringement; whether Google is liable for infringement for the actions of its bot; and others. But let’s stop here for a moment and examine how the law would apply to Ancestry.

Presumably, the path leads in the same direction. That is, when a user clicked on the relevant link in the IBC, Ancestry would be “passive” in that process and thus there would be no infringement by Ancestry when users requested information from the IBC.

But a couple of facts seemed important to the court in reaching this conclusion. First, the court pointed out that pages retrieved from Google’s cache contain a “conspicuous” disclaimer that the cached page is not the “original” and that there are two separate links to the current page. It is not clear, or certainly was not clear at the outset, that Ancestry’s IBC would operate in that manner. Second, the court examined the purposes of Google’s cache. For example, “Google’s ‘Cached’ links allow users to view pages that the user cannot, for whatever reason, access directly.” As to the IBC, while it was behind Ancestry’s paid subscription wall, this was true only for paid subscribers. Additionally, Google’s cache enables users to determine how a Web page may have been altered over time as well as to determine more quickly whether and where a search query appears and thus whether the page is germane to the user’s query. It is not at all clear that Ancestry’s IBC would operate in this manner. Recall that Ancestry began calling it a “search engine” only after the negative initial response. We do not now know Ancestry’s true intent at the outset of this project or what would have happened had they chosen to press ahead despite the negative reaction. [These are matters that we might be able to discover through various procedures if litigation had been commenced].

Back to the Field Case: Google asserted several defenses to Field’s claim. First, Google asserted that Field had granted it an implied license to use his content. The law on this matter is that a copyright owner may grant a nonexclusive license expressly or impliedly through conduct. Melville B. Nimmer & David Nimmer, Nimmer On Copyright, vol. 3, section 10.03[A] (1989) An implied license can be found where the copyright holder engages in conduct from which the other party may properly infer that the owner consents to his use. The United States Supreme Court endorsed this rule in the 1927 patent case of De Forest Radio Telegraph & Telephone Co. v. United States, 273 U.S. 236. Consent to use a copyrighted work may be based on the copyright holder’s silence where the copyright holder knows of the use and encourages it.
Recall that Field knew that had he placed a “no archive” meta-tag on the pages of his Web site, Google would have known not to display “Cached” links to his pages. Nonetheless, Field specifically chose not to include the no-archive meta-tag on his site, knowing that Google would interpret this absence as permission to allow access to the pages via “Cached” links. The court said: “Thus, with knowledge of how Google would use the copyrighted works he placed on those pages,and with knowledge that he could prevent such use, Field instead made a conscious decision to permit it. His conduct is reasonably interpreted as the grant of a license to Google for that use.” [412 FSupp2d at 1116]

Does this ruling in the Field case mean the burden is always on the copyright holder to preemptively fend off those crawling or scavenging the Web for copyrighted material? Consider that the inclusion of a “no archive” meta-tag or the appropriate “robots.txt” file is relatively simple for the content owner while as the court said “Given the breadth of the Internet, it is not possible for Google (or other search engines) to personally contact every Web site owner to determine whether the owner wants the pages in its site listed in search results or accessible through ‘Cached’ links.” [412 FSupp2d at 1112]

On the other hand, a copyright owner should have the right to choose which “distributors” or search engines the copyright owner wishes to grant a license. This would require knowledge of the use to which the other party intended to make of the copyright holder’s content, as the Field court said. In the case of Ancestry’s IBC, no content owner knew in advance that Ancestry would make such use of their content.

On this last point, some have referred to The Generations Network’s Terms and Conditions, specifically this provision:

User provided content
Portions of the Service will contain user provided content, to which you may contribute appropriate content. For this content, Ancestry is a distributor only. By submitting content to Ancestry, you grant MyFamily.com, Inc., the corporate host of the Service, a license to the content to use, host, distribute that Content and allow hosting and distribution of that Content, to the extent and in that form or context we deem appropriate. Should you contribute content to the site, you understand that it will be seen and used by others under the license described herein. You should submit only content which belongs to you and will not violate the property or other rights of other people or organizations. MyFamily.com, Inc. is sensitive to the copyright of others.

In my view, nothing in that provision puts one on notice that Ancestry.com would use robots to crawl the Web in a manner similar to Google or other search engines. Indeed, the choice of the verbs “submit” and “contribute” suggest more than a passive or silent consent to use content.

Recall that Mr. Field set out to get Google to use his content so he could sue them for infringement!

But, one additional point on the responsibility of content owners to protect their content: the court points out that the use of meta-tags has been an industry standard “for years.” I can see a court in a future case using this fact to hold Web publishers responsible to protect their content by communicating their preferences to Web crawlers.

The “Estoppel” Defense: Google put forth (successfully) a defense to copyright infringement known as “estoppel.” This means that: (1) the content owner knew of the allegedly infringing conduct; (2) the content owner intended that the alleged infringer should rely on the content owner’s conduct or acted in such a way that the alleged infringer had a right to believe it was so intended; (3) the alleged infringer was ignorant of the true facts; and (4) the alleged infringer relied on the content owner’s conduct to its detriment.

Put plainly, this means, for example, that the content owner acted in a manner to lead the alleged infringer to believe that the content owner did not object to the alleged infringing conduct and in reliance on that, the alleged infringer went ahead with the conduct.

In the Field case, the success of this defense has much to do with Mr. Field’s (dishonest) conduct. But this defense could succeed where there is no dishonest conduct. For example, this morning, I discovered a rather new site called Blogoholix. It purports to be a “blog search engine.” There is a note on the main page which says “es.blogoholix.com is a blog search engine in development. The tech and design work is still in progress, so please send an e-mail to info@blogoholix.com if you have any suggestions on how to improve the site.” I found GeneaBlogie on that site. Suppose with that knowledge and the knowledge that I can prevent my blog from showing there, I do nothing, and the owner of that site continues to crawl my blog. I think a court following the Field reasoning would say that my silence is conduct that they are entitled to rely upon.

Well, that may be enough law for today. Tomorrow in Part 3, we’ll explain fair use and the Digital Millenium Copyright Act. In Part 3, we’ll take a very specific look at Ancestry’s IBC. After that, we wrap this up with Part 4 and some conclusions and suggestions.

Part 1 can be found here.

TOMORROW: Fair Use and The Digital Millenium Copyright Act Meet Ancestry.com

Notice: The information in this writing is intended for educational use only and is not intended nor should it be construed as legal advice. If you have a legal problem, consult a lawyer admitted to practice in your state of residence. I am an active member of the bar of the State of California and am admitted to practice before the United States Supreme Court and various other federal courts. I am not licensed to practice in any other state. I am not presently soliciting or accepting new clients in the matters discussed above.