Google and Yahoo start indexing SWF's

Via: Theo Hultberg.

An odd story caught my attention recently, and I've been meaning to put my thoughts down.

I'm often asked the question about indexing flash content recently, and the recent announcement by google only increased the stir.

It's odd, because even Adobe employees seem completely clueless about what it means, and the implications.

Quote from Ryan Steward:

So what does that mean? We are giving a special, search-engine optimized Flash Player to Yahoo and Google which is going to help them crawl through every bit of your SWF file. This Flash Player will act just like a person would in some cases. It will click on your buttons, it will move through the states of your application, get data from the server when your application normally would, and it will capture all of the text and data that you’ve got inside of your Flash-based application. We’ve basically provided a very powerful looking glass into SWF files so Google and Yahoo can pull out meaningful information.

Is in sharp contrast with what google is saying:

We currently do not attach content from external resources that are loaded by your Flash files. If your Flash file loads an HTML file, an XML file, another SWF file, etc., Google will separately index that resource, but it will not yet be considered to be part of the content in your Flash file.

So essentially, google will index your SWF, but not the actual content it loads. Most modern Flash Apps don't hardcode any textual content these days, and will likely load most of their data from the servers. Most importantly, I feel the SWF should not be indexed at all. SWF is middleware, it is responsible for delivering content to the user, it (should not) be the actual content itself, for any serious web application.

One more gem from the Google blogposting:

That said, you should be aware that Google is now able to see the text that appears to visitors of your website. If you prefer Google to ignore your less informative content, such as a "copyright" or "loading" message, consider replacing the text within an image, which will make it effectively invisible to us.

So where is this coming from?

My guess is one argument between picking HTML vs. Flash to deliver your content, it could be said that a Flash is not SEO-friendly. Getting this message out allows pro-flash people to fight back a little. It definitely feels that this whole announcement has little to do with the technology, but much more with putting the Flash-brand in a better light.

How would one actually make SWF's SEO-friendly?

Just don't. Make sure the content is available on the web in an alternative format. Often your flash content is stored in a database (or an XML file for smaller sites). Pick up your favorite server-side scripting language, and make sure the content is also available in an indexable format. Using fancy CSS and Javascript usage you can make sure the content is replaced by the Flash content when a regular user visits.

If you do this, all the normal SEO rules are applied. As a side effect, the user also benefits from this as your content degrades nicely for older or for example mobile browsers, people with disabilities and you name it. The sole reason for this is that search engines are actually try to find 'quality content' based on your search query.

Last but not least, XHTML is a form of XML. If you use XHTML as a datasource for your content, search engines can also access it directly.