UTF-8 based transformation output in .Net
by Pascal Opitz on August 17 2006, 11:38
Using XSL transformations in .Net I came accross the weird behaviour that my transformations would be UTF-16 encoded even though I specified UTF-8 in the <xsl:output />
tag.
This left me a bit speechless, and I was assuming that this only could be a .Net bug. After a bit of research, however, I found this to be a result of .Net being very specific about character encodings.
In my following example the StringWriter has the property Encoding set to System.Text.Encoding.UTF-16, hence the output charset will be UFT-16 as well, no matter what I specify as character set in the XSL.
XslTransform xslt = new XslTransform();
StringWriter output = new StringWriter();
xslt.Transform(xml, args, output);
String code_transformed = output.ToString();
Steven Livingstone pointed out that, since the encoding property of System.IO.StringWriter is a read only property, one has to provide a different Stream object to recieve the transformation output, if this is to be encoded in UTF-8:
XslTransform xslt = new XslTransform();
MemoryStream ms = new MemoryStream();
xslt.Transform(xml, args, ms);
ms.Position = 0;
StreamReader sr = new StreamReader(ms, Encoding.UTF8);
String code_transformed = sr.ReadToEnd();
Another possibility would be to extend the StringWriter class in order to make a different encoding possible, as suggested on Robert McLaws FunWithCoding.Net which would read as follows:
using System;
using System.IO;
using System.Text;
namespace MyAwesomeNamespace
{
public class StringWriterWithEncoding : StringWriter
{
private Encoding _enc;
public StringWriterWithEncoding(Encoding NewEncoding) : base()
{
_enc = NewEncoding;
}
public override System.Text.Encoding Encoding
{
get
{
return _enc;
}
}
}
}