Saturday, June 28, 2008
by Nik Kalyani
Saturday, June 28, 2008 2:00:30 PM (Pacific Standard Time, UTC-08:00)
I have been learning how to create apps for Google AppEngine, relying mostly on videos. I figured other people might be interested in the same thing so I setup a Ning site that aggregates AppEngine videos.

Here it is: http://www.AppEngine.tv

Enjoy.

by Nik Kalyani
Saturday, June 28, 2008 9:40:59 AM (Pacific Standard Time, UTC-08:00)
On the LinkedInBloggers group, there is an interesting discussion on how to prevent blog harvesting. Turning off RSS feeds and subscription feeds seemed to be the suggested solution. I think this is an impractical solution and makes your blog harder to find and harder to consume with RSS readers.

I wonder if disabling RSS/subscription widgets is the only way? What if there was a simple way to ensure that your content only displays in a browser if it is being served from your site and when displayed on a harvester's site, it simply redirects the browser back to your site?

I came up with a solution that might work. My solution is based on two assumptions:

1) RSS readers ignore Javascript

2) Most blog engines have a templating feature that allow the URL of the blog post to be injected anywhere on the page containing the post

The solution is pretty simple:

Embed a simple script in your blog post that checks to see if the location where your blog content is being displayed is valid (i.e. your blog) or invalid (i.e. harvester site). If it is invalid, then redirect the browser to your blog.

Not only does this approach thwart harvesters (at least until they filter out the script), but it has the added benefit of getting the search traffic from the harvester's site back to your blog.

Let's walk through the changes you would make to your blog's template in order to enable this capability:

Original blog HTML:

This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog.



Steps for modifying blog HTML:

Step 1: Add DIV element wrapper for content

<div id="BlogContent" title="http://www.yourblogsite.com/URL-of-your-blog-post.htm">

This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog.
</div>

I used an id of "BlogContent" but you can use anything you want. If your blog displays the entire contents of more than blog post on a page, you will want this entry to be changed for each blog. In that case, try using "BlogContent{{ ID }}" where {{ ID }} is your blog engine's token for some unique identifier associated with your blog. If you take this approach, be sure to modify the "BlogContent" string in Step 2 also.

Also, note the URL in the value of the "title" attribute of the <div> containing the blog content. You should not actually type in a URL there, but instead use the token feature of your blog engine that will inject the URL of the blog post page. Something like:

<div id="BlogContent" title="{{ PostURL }}">

({{ ID }} and {{PostURL}} are not an actual tokens...I just made them up. You will need to look at your blog engine's documentation to figure out the tokens you should use.)

This URL serves two purposes:
- It provides a standards-compliant way to include the original URL of your blog in the blog content so that no matter where the content is posted, the original URL is always in the HTML source code, and
- It provides the script in Step 2 to have a known place to find the original URL

Step 2: Embed script to foil harvesters

The script to embed is:

<script type="text/javascript">
var blogContent = document.getElementById("BlogContent");
if (location.href.toLowerCase().indexOf(blogContent.title.toLowerCase()) != 0) location.href = blogContent.title;
</script>

Here's what the script is doing:

a) Find the HTML element containing the blog content

var blogContent = document.getElementById("BlogContent");

b) Test if the content is running on the original site, if not, then redirect to the original site

if (location.href.toLowerCase().indexOf(blogContent.title.toLowerCase()) != 0) location.href = blogContent.title;

Here's what the final content might look like:

<div id="BlogContent" title="{{PostURL}}">

This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog.
<script type="text/javascript">
var blogContent = document.getElementById("BlogContent");
if (location.href.toLowerCase().indexOf(blogContent.title.toLowerCase()) != 0) location.href = blogContent.title;
</script>

</div>

Step 3: (Optional) Putting the script in a separate file

Instead of placing the script in each blog post as described above, you can also put the script into a separate file such as harvestblock.js. This will reduce the page size as the entire script will not be repeated for each blog post. You only need to include this part of the script in the file

var blogContent = document.getElementById("BlogContent");
if (location.href.toLowerCase().indexOf(blogContent.title.toLowerCase()) != 0) location.href = blogContent.title;

If you do this, the revised content might look like:

<div id="BlogContent" title="{{PostURL}}">
This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog. This is the content of my blog.
<script type="text/javascript" src="http://www.yourblog.com/harvestblock.js"></script>
</div>

Note: The URL used for the script must be a fully-qualified URL because it must work no matter whether the content is running on your site or on the harvester's site.



Let's look at what happens when you make this change:

1) User looking at content on your site

The script will detect a match between the URL being displayed in the browser and the URL of the blog post. As a result, it will do nothing and there will be no change in behavior from what your users are already seeing.

2) User looking at content in their RSS reader

The script will not run and as a result there will be no change in behavior from what your users are already seeing.

3) User looking at content on harvester site

The script will detect a mis-match between the URL being displayed in the browser and the URL of the blog post. As a result, it will redirect the user to the original blog post.

This solution is not fool-proof. If a harvester is stripping script embedded in a blog post then it will not work. I highly doubt this will happen very often because most harvested content is simply the content from the RSS feed as-is.

If you employ this solution please provide information on the specific token you use with your blogging engine in the comments.

#    Comments [0] - Trackback    

 Saturday, April 26, 2008
by Nik Kalyani
Saturday, April 26, 2008 10:48:18 PM (Pacific Standard Time, UTC-08:00)

As I go through the process of getting familiar with Google AppEngine, I'll post interesting things I learn (mostly so I can find them later). Here's a quick note on GQL query syntax:

Assuming a model called Customer, you can use:

1) customers = db.GqlQuery("SELECT * FROM Customer ORDER BY name LIMIT 10")

2) And since GQL queries always return data objects, you can skip the SELECT * and abbreviate to:

customers = Customer.gql("ORDER BY name LIMIT 10")

3) You can also use parameters like this:

customers = Customer.gql("WHERE name = :1 ORDER BY name LIMIT 10", "Smith")

4) And finally, #3 with named parameters like this:

customers = Customer.gql("WHERE name= :person ORDER BY name LIMIT 10", person="Smith")

 Sunday, April 06, 2008
by Nik Kalyani
Sunday, April 06, 2008 8:50:10 PM (Pacific Standard Time, UTC-08:00)



Found this little gem on my Facebook feed today. Glad that Shaun has finally come around and become a fan of DotNetNuke. ;-)
 Tuesday, April 01, 2008
by Nik Kalyani
Tuesday, April 01, 2008 11:45:33 AM (Pacific Standard Time, UTC-08:00)
Got word today that I have received the Microsoft MVP Award for the third time. Of course it is April 1, so it could all be a big joke. I am hoping not.

Dear Nik Kalyani,  

Congratulations! We are pleased to present you with the 2008 Microsoft® MVP Award! The MVP Award is our way to say thank you for promoting the spirit of community and improving people’s lives and the industry’s success every day. We appreciate your extraordinary efforts in ASP/ASP.NET technical communities during the past year...

I am honored. Thanks Microsoft.

#    Comments [0] - Trackback    

RSS feed
Search and Links
Bling

View Nik Kalyani's profile on LinkedIn

Contact me: nik*kalyani.com (replace "*")

TechBubble
www.flickr.com
This is a Flickr badge showing public photos from techbubble. Make your own badge here.
Statistics
Total Posts: 216
This Year: 34
This Month: 2
This Week: 0
Comments: 238
About the author/Disclaimer

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008
Nik Kalyani
Sign In
All Content © 2008, Nik Kalyani