[Wtr-general] W3C validation problems

Cain, Mark Mark_Cain at rl.gov
Mon Apr 3 17:18:18 EDT 2006


You should be able to do:

ie = IE.new
ie.goto 'http://validator.w3.org/'
ie.html

--Mark

-----Original Message-----
From: wtr-general-bounces at rubyforge.org [mailto:wtr-general-bounces at rubyforge.org] On Behalf Of Jørgen Bang Erichsen
Sent: Monday, April 03, 2006 2:01 PM
To: wtr-general at rubyforge.org
Subject: [Wtr-general] W3C validation problems

Hi,

Inspired by 

http://redgreenblu.com/svn/projects/assert_valid_markup/lib/assert_valid_markup.rb

I would like to have an easy way to validate the html on the page IE is
currently showing. Unfortunately, I have a problem with the html that
ie.document.body.parentelement.outerhtml outputs :-(

Take a look at the following example:

require 'test/unit'
require 'watir'
require 'net/http'
require 'cgi'
require 'xmlsimple'

class ValidationExample < Test::Unit::TestCase
  include Watir
  
  def test_w3c_validate
    ie = IE.new
    ie.goto 'validator.w3.org/'
    html = ie.document.body.parentelement.outerhtml
    response = Net::HTTP.start('validator.w3.org').post2('/check', "fragment=#{CGI.escape(html)}&output=xml")
    markup_is_valid = response['x-w3c-validator-status']=='Valid'
    message = markup_is_valid ? '' :  XmlSimple.xml_in(response.body)['messages'][0]['msg'].collect{ |m| "Invalid markup: line #{m['line']}: #{CGI.unescapeHTML(m['content'])}" }.join("\n")
    assert markup_is_valid, message
    ie.close
  end
  
end

When I run the example I get stuff like:

Invalid markup: line 1: no document type declaration; implying "<!DOCTYPE HTML SYSTEM>"
Invalid markup: line 1: there is no attribute "XML:LANG"
Invalid markup: line 1: there is no attribute "XMLNS"

The html returned by ie.document.body.parentelement.outerhtml is

<HTML lang=en xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<HEAD>
  <TITLE>The W3C Markup Validation Service</TITLE>
  <LINK rev=made href="mailto:www-validator at w3.org">
  <LINK title="Home Page" rev=start href="./">

but if I view the source from IE itself it is something like

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>The W3C Markup Validation Service</title>
    <link rev="made" href="mailto:www-validator at w3.org" />
    <link rev="start" href="./" title="Home Page" />
...

The DOCTYPE line and several quotes are missing. Is there any
way to get the unmodified html for the current page? 

If people are doing automatic validation any other way I am open
to suggestions.

Best regards,

Jørgen

_______________________________________________
Wtr-general mailing list
Wtr-general at rubyforge.org
http://rubyforge.org/mailman/listinfo/wtr-general




More information about the Wtr-general mailing list