Fair Use, the AP, and the Internet: Why the AP Isn’t Wrong
by
on June 13, 2008,
The AP isn't wrong.
For some odd reason, there is a viewpoint in Web 2.0 that as long as everyone seems to be doing it, the laws should be changed, or somehow re-interpreted. High on the list of those laws that the horde is clamoring to overturn are those dealing with copyright infringement and libel.
The issues regarding copyright and fair use rear their collective head fairly regularly when it comes to online content, with the music industry as well as the writing industry trading places as the bitchmeme du jour. This time up, it's the writing industry, and the giant in this David and Goliath argument is the Associated Press, which filed seven DMCA take-down notices against Drudge Retort.
The argument being made is that Drudge Retort is a news aggregator of the same design as Digg or Reddit, and that since the full text of the articles wasn't appropriated (only excerpts), it is covered under Fair Use. Bloggers far and wide are decrying the AP as a big bully misinterpreting copyright law, but I believe the misinterpretation isn't on the AP's part.
News aggregated in Drudge Retort is user-submitted, with the headline usually the link back to the original source. The excerpts posted are again submitted by the submitter, and usually do not reference the originating source. In their arguments, most bloggers are turning to interpretation of the Fair Use sections of copyright law, but not the actual sections themselves.
The actual section of the U.S. copyright law on Fair Use starts in § 107. Limitations on exclusive rights: Fair use, which states:
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include —
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.
The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.
Note how vague that is, but also, what it includes. A news aggregator is dubiously included under the "criticism, comment" text since it does include a user forum dedicated to discussion of the article. However, you'll notice that the fourth factor includes "the effect of the use upon the potential market."
Bloggers know full well that their content will likely be scraped at some point, some for the good, but most for the bad. The majority of that scraping ends up happening after the article isn't as relevant: days or weeks after it is originally posted. It can alter search ranking on the article, but it can also help, which is why splogs are the enemy but All Things Digital's Voices is not. One exists solely to make money by stealing others' content, while the other exists to highlight content from smaller places than the WSJ.
The AP is (and will) argue that by including enough of the story for Drudge Report's users to understand the original article without having to trouble themselves by reading it, they are devaluing the content. That is a valid argument. Submitters can grab and repost that content faster than any splog bot can scrape it, meaning it can appear almost as quickly as the AP content does.
The length of the excerpts isn't the issue, since while we were taught that seven identical words in a row was plagiarism in school, copyright law doesn't give any specific length. The argument that the full articles weren't reproduced, or that "everyone does it" isn't a valid argument.
Everything in the world can't be free. Arguing that the AP doesn't always attribute doesn't mean that two wrongs make a right. Being too lazy to write your own summary of an article rather than posting an excerpt and grabbing enough of the article so that lazy readers don't have to bother reading the full article is not, and should not, be covered under Fair Use.
If you enjoyed this post, make sure you subscribe to profy RSS feed!









Cyndy, I obviously disagree. Whether or not a specific length is described in the law as it pertains to fair use, the courts have shown that short excerpts are protected, except in cases where the person doing the excerpting is doing it for commercial gain.
The bottom line is that Drudge Retort is not a commercial entity, the work in question has already been published and the “amount and substantiality” of the use is incredibly small in relation to the whole work. As for the effect on the potential market for the content, I find it hard to believe that Drudge Retort quoting 20 or 30 words is going to mean no one pays for AP wire services any more — especially since the users of that site aren’t the intended market for AP’s business in the first place.
Do you also think that Google News is (or should be) illegal under copyright laws?
Mathew, I assumed that you would.
The Drudge Retort runs ads, which makes them a commercial entity, whether they actually turn a profit or not. While the site itself is protected from suit under safe harbor laws, it doesn’t make them exempt from DMCA take-downs.
There are big sites who argue that Google News is in violation of copyright law. For a site like Profy, which enjoys increased traffic from a feature on Google News, it’s obviously a win. Other agencies, like the AFP, don’t necessarily see it that way, and they were successfully sued in the Belgian courts. The U.S. copyright law includes that risk/benefit analysis as part of fair use for a reason. For smaller sites, Google News is a benefit. For an agency like the AP, it may not be.
From my reading of past fair use cases, there is very little chance that Drudge Retort would qualify as a “commercial entity” simply because it runs a couple of small ads. By that measure, every blogger with AdSense is a commercial entity — which, if true, means that there will be plenty of work for copyright lawyers until the earth spirals into the sun.
As for the impact on the market, my sense (and obviously, part of the problem is that all this is open to interpretation by the courts) is that it isn’t strictly speaking a “cost/benefit” analysis. According to Stanford’s Fair Use overview, the court looks at “whether your use deprives the copyright owner of income or undermines a new or potential market for the copyrighted work.” I would argue that neither of those conditions is fulfilled in this case.
New York State argues that any person with an Amazon affiliate account means a company is doing business in that state, even if the affiliate doesn’t make any money, so yes, anyone with an AdSense account is a commercial entity. Also, the argument is going to be the dilution of value. The AP makes money by selling their content to newspapers and web sites. If that content is immediately available somewhere else in a condensed form, the value of paying for that content (or image) is diluted pretty quickly. How many smaller sites or papers could feasibly drop their subscriptions, grab the content from somewhere big, and then post it under fair use if your interpretation is the correct one? That’s a two-prong deprivation of income.
Well, what New York State argues and what the courts are willing to accept could be two entirely different things, wouldn’t you agree? In any case, we have no way of knowing. I’m simply saying that a court would likely take into account whether arriving at such a definition of “commercial entity” wouldn’t stretch the meaning of the term to such a point that it would become virtually meaningless, not to mention opening up a vast territory of relatively innocent behaviour to litigation.
As for the AP’s business being affected, copyright law is not designed specifically to protect anyone’s existing business model. The dilution of the market for AP’s content is just one part of the overall picture — and the courts have decided in the past that in certain cases, newsworthy events are exempt from copyright in some limited form. I would argue that that factor, combined with the small amount of the work being used, would protect Drudge Retort. Could the AP really be arguing that two or three sentences from a lengthy article strip that article of all its value and make it worthless? I find that hard to believe.
The Drudge Retort is a hobby that got out of hand. It makes enough to keep the servers running and a little part-time income.
The issue that prompted me to go public with the AP dispute was this: If I’m interested in what Cyndy has written here, and I copy the lead paragraph of her blog entry and link it on my blog, am I committing a copyright violation?
What AP calls a copyright violation and “hot news” misappropriation is common practice on the web. I think it’s also legal behavior that enriches the linked source and helps the public share and evaluate current events.
AP wants to fill in some facts and perspective on its recent actions with the Drudge Retort, and also reassure those in the blogosphere about AP’s view of these situations. Yes, indeed, we are trying to protect our intellectual property online, as most news and content creators are around the world. But our interests in that regard extend only to instances that go beyond brief references and direct links to our coverage.
The Associated Press encourages the engagement of bloggers — large and small — in the news conversation of the day. Some of the largest blogs are licensed to display AP stories in full on a regular basis. We genuinely value and encourage referring links to our coverage, and even offer RSS feeds from http://www.ap.org, as do many of our licensed customers.
We get concerned, however, when we feel the use is more reproduction than reference, or when others are encouraged to cut and paste. That’s not good for original content creators; nor is it consistent with the link-based culture of the Internet that bloggers have cultivated so well.
In this particular case, we have had direct and helpful communication with the site in question, focusing only on these issues.
So, let’s be clear: Bloggers are an indispensable part of the new ecosystem, but Jeff Jarvis’ call for widespread reproduction of wholesale stories is out of synch with the environment he himself helped develop. There are many ways to inspire conversation about the news without misappropriating the content of original creators, whether they are the AP or fellow bloggers.
Jim Kennedy
VP and Director of Strategy for AP
@Rogers My feeling is that, yes, it is. You aren’t offering commentary or parody or anything else on it. You are just commenting on it. Add in the “protest” encouraging EVERYONE to gank an AP piece and it just looks ridiculous. I’m pretty sure we were all taught proper quoting and attribution in school, yet this Wild West mentality online tries to justify it with the “common practice” argument. It’s also common practice to speed on the expressway; that doesn’t make it right or legal.
Paul, you say that you “value and encourage referring links” but that you get concerned when the use is “more reproduction than reference” and when others “are encouraged to cut and paste.” There’s a universe of interpretation in between those statements. How much are we allowed to refer to — a sentence? Two sentences? A paragraph?
@Mathew The NYS Courts are going to hear both the sales tax issue as well as this one if pressed, which is why I referenced one with the other. My understanding is that the notices referenced DMCA under Federal law, but the “hot news” misappropriation falls under State law.
Cyndy, I think everyone knows when something has been “ganked,” as you put it, and that’s not even close to what we’re talking about at Drudge Retort. Is quoting a 30-word chunk — with a link to the original source, let’s remember — the same thing as stealing an entire article, or even the essence of an article? I find that hard to believe.
@Mathew Can you tell I was responding quickly?
There is a difference between quoting, which implies using a selection in context of commentary or reporting, and posting that same 30-word chunk. In an ideal world, the excerpt wouldn’t be the entire gist of the article, every reader would click through and read the original, and probably everyone would be happy, from content creators to advertisers. The reality is that yes, often that 30-word chunk is a quick summary of the article, few ever click through, and the link to the source isn’t noted. When I first checked out the site in question, I didn’t even notice the link, to be honest. I don’t have underlining turned on, and the colors appear very close on my monitor. I found the link by hovering my mouse anywhere I thought I might find it to give them the benefit of the doubt that a source link was actually there.
Then I guess the solution is obvious — writers whose work is picked up by the AP need to do a better job of making their stories appealing enough that people will click through to read them, regardless of which series of sentences are excerpted on a blog
The only issues I can see with Drudge Retort is that the summaries don’t textually cite the source, nor do they make it clear that what we’re seeing is a direct quote. That could be fixed by simply telling DR contributors to use a bookmarklet that prefills the posting form with both the quoted material and the originating page’s title (which usually includes the source’s name) or the source’s domain name. It also wouldn’t hurt if the entire quote was an active link.
As for specific points:
Cyndy: “Bloggers know full well that their content will likely be scraped at some point…”
When a blogger is scraped, the full text of an entry is copied, usually without any link back to the source material. A quote is a completely different animal, with a completely different purpose.
Cyndy: “…by including enough of the story for Drudge Report’s users to understand the original article without having to trouble themselves by reading it, they are devaluing the content.”
I just browsed through the DR site, and spotted multiple instances of entire articles being reproduced in the comments section with little or no attribution, and no links. I have absolutely zero problem with the authors of such articles raising a stink… they’re getting neither credit nor link-love.
But in the example of a disputed posting that Rogers provided on his blog, the AP’s quoted content is merely a wordy rephrasing of their own title, and thus isn’t doing any (highly dubious, IMO) damage that the title alone wouldn’t do. The remainder is a quote from Hillary Clinton, who hopefully won’t file suit against the AP for devaluing her speeches with their cherry-picked subset of her comments.
(The AP itself provides RSS feeds containing similar excerpts. If their argument is that the DR posting devalued their content, then they’re clearly a self-destructive organization.)
It’s also worth pointing out that if you View Source on the article in question, you’ll find that Yahoo News has placed the same AP-owned content from the DR post in the description meta tag. That means they want Google (and any other link aggregation services) to use that quote in their listings. So the AP clearly needs to have a sit-down with some of their licensees.
Cyndy: “The argument that the full articles weren’t reproduced, or that “everyone does it” isn’t a valid argument.”
As I’ve said elsewhere, the web is just one big copyright infringement. The proxy servers, browser caches, search engines, and so on that make the whole thing functional are, under any strict interpretation of copyright law, a violation. In reality, “everyone does it” isn’t just a valid argument… it’s a necessity. In a networked environment like the WWW, the only real test for fair use is intent.
“We get concerned, however, when we feel the use is more reproduction than reference, or when others are encouraged to cut and paste. That’s not good for original content creators; nor is it consistent with the link-based culture of the Internet that bloggers have cultivated so well.”
I’m with you on this one, Paul. It outlines the problem I have with so many of the new social media darlings like Disqus and Shyftr. At what point does content reference or comment help become content stealing and affect the blogger/ original source negatively?
@Roger On your first point, no, that isn’t always the case. I’ve had posts scraped as excerpts, as well as scraped, tossed through a translator and back out, and then posted as a nearly unintelligible mess. There are a lot of ways to scrape.
As for the content in RSS feeds, a) the author is making the decision to provide that feed as a service for readers, and is still technically under their right to reproduce under copyright and b) they can monetize their feed content themselves, not have others monetize the content.
Claiming that “cache” is a copyright infringement is reaching. What I’d like to know, since apparently everyone thinks all creative content should just be free, is how anyone expects that content to continue to be produced? If writers should provide their content for free for anyone to reproduce at their discretion and musicians should do the same, along with television and movies, what is the incentive to continue to produce? Just artistic expression?
Cyndy Aleo-Carreira said that blogs don’t offer commentary, they simply comment. To be exact, “You aren’t offering commentary or parody or anything else on it. You are just commenting on it.” Truly, not the level of analysis I had hoped for. I hope that doesn’t go into any briefs filed with the Court.
Ok, that was a somewhat low blow. Valid, but low. I apologize. Seriously, though, let me note one underlying premise that seems to be missing from this commentary: Linking on the internet is an undisputed increase in value of an internet commodity. What AP wants is the link, which increases value, without the quote, but may decrease value.
The argument that direct quotes, no matter how small, may decrease the value of the quoted piece, is a valid argument, but in the internet culture, it’s an argument that simply fails given the context. The “freewheeling nature” of the internet puts the onus of anyone who wishes to be a respected commentator to quote and link directly. I cannot count how many sites out there misquote, selectively quote, distort, misread, change, reinterpret, reframe, undermine, or otherwise misrepresent an article. The medium is full of charlatans and tricksters and malicious writers, writing for their own undisclosed ends. If anyone wants to be taken at all seriously, there will be a quote and a link.
This is the key insight. The fact that “Everyone is doing it,” is not in and of itself a good argument. The argument is, “Everyone is doing it FOR A REASON.” That reason is, no one would believe you, or read you, on the internet, without that quote and link. On the internet, a post often isn’t worth reading if it only consists of a link and a summary, without a quote to back up whether the individual poster is full of it. And on the internet, the presumption is, Mr. Anonymous is full of it, until proven otherwise. In essence, everyone is doing it because it is the only way to ascertain legitimacy in an otherwise lawless, incessant flow of words.
So, Factor Five of the fair use balancing test, whether short quotes plus links increase or decrease the value of the work quoted, will see the following issues in play: (1) whether the quote actually decreases linking to the source, thus diluting the source’s value, (2) whether the link increases the value, and in such a way as to compensate the source for the dilution, if any exists, and (3) whether the quote and the link are so helpful, given the internet environment, that it outweighs any loss in value to the source article, if such loss exists.
The remaining four factors in the Fair Use test fall squarely on the side of the bloggers.
That is funny. When a search engine or aggregator takes pieces of content, is is called “indexing,” but if someone you don’t like or someone who you think you can get money from is taking small pieces of content it is called “scraping.” AP, if you want to play in the “link-based culture of the Internet that bloggers have cultivated,” you play by our rules or just stick to print.
@Bribes I’m getting the feeling that you either read something I said out of context, or it’s from someone else misinterpreting what I said. In no way, shape, or form did I EVER say that blogging doesn’t fall under Fair Use, and since that’s not what the AP filed their DMCA notices over, I’m failing to understand why the blogosphere has their collective panties in a bunch. What I said was that many aggregators, including Drudge Retort, were using content as submissions with no additional commentary. THAT does not fall under Fair Use as I understand it.
@Nick I think there is a big difference between search engine and aggregator. I don’t think there are very many people out there who would think Google actually generated the content, but that doesn’t hold true for user-submitted aggregators or splogs, where it could easily be confused.
@Cyndy Aleo-Carreira,
You’ll find that statement under your name in the comment posted on June 13th, 2008, at 1:03 pm, in this comment thread. Scroll up and ye shall find it. Now, yes, I took it out of context, thus the tongue-in-cheek nature of the first paragraph (notice I started with only a summary *cough* and then followed with a direct quote lasting more than 5 words), and the immediate apology. But, it’s wasn’t that far out of context. The full context is as follows:
Roger Cadenhead wrote, “The issue that prompted me to go public with the AP dispute was this: If I’m interested in what Cyndy has written here, and I copy the lead paragraph of her blog entry and link it on my blog, am I committing a copyright violation?”
Cyndy Aleo-Carreira responded, “@Rogers My feeling is that, yes, it is. You aren’t offering commentary or parody or anything else on it. You are just commenting on it.”
Mr. Cadenhead has a blog. I checked his blog. For each and every post, there is usually personal commentary and also a comments sections, and there are readers who comment on and discuss the posts, and I’m sure the blogger posts comments as well. That is one of the core things that political blogs do, and that is what you just argued to be copyright infringement. In essence, you just said that at least one central aspect of blogging, especially political blogging, falls outside of fair use and is a copyright infringement.
You responded to me, saying, “In no way, shape, or form did I EVER say that blogging doesn’t fall under Fair Use.” Now, this surprised the hell out of me. Happily, you clarified by writing, “What I said was that many aggregators, including Drudge Retort, were using content as submissions with no additional commentary. THAT does not fall under Fair Use as I understand it.”
And that goes to the heart of my original post: Implied in the joke is the question, what is a “comment”? Given the nature of the internet and the blogosphere in particular, and given generally accepted definitions of “comment,” your definition of commentary is too narrow.
Notice how central quoting was to writing my response. I could not have written it by simply summarizing Cyndy’s words. No one would have believed me, nor wasted the time looking around to confirm my arguments, because I’m Mr. Anonymous. I have NO credibility. Why should anyone believe me??
Luckily, I can quote what someone else said, and the quotes give my argument credibility, and perhaps make it worth checking out the source material. In this case, a simple text search of the this commentary page will do.