Internationalization and Multilingualism in Web Standards
Larry Masinter
Palo Alto Research Center
November 1996
Purpose of talk
- Overview of Web Standards
- Set context for Authoring,
Management, Deployment domains
- Current status of infrastructure
- Open issues in I18N
First: What is the Web?
- One network, everyone on it
- Mixed modes of communication
- Multiple media
One Network, Everyone on it
Mixed modes of communication
- Publish, retrieve
- Send, recieve
- Broadcast, filter
- Interact in real time
For multiple media
- Text
- Graphics
- Video
- Audio
Who makes Web Standards?
- Standards organizations
- Consortia
- Companies
- Individuals
Kinds of web standards
- Content
- what are the objects we're
moving around?
- Protocols
- Naming
- how to reference something
not in hand?
Standards for Web Content
- MIME
- HTML as a MIME type
- Internationalization issues
MIME:
MultiPurpose Internet Mail Exchange
- Originally designed for mail
- Allows
- Multiple media
- Multiple character sets
- Multiple languages
Internet Media Types ("MIME types")
- Standard way of naming data formats
- Hierarchical structure with
parameters
- Applications use MIME to decide
how to interpret data (instead of file extension)
- text, image, audio, video,
multipart, application
MIME Major Types
- text:
sequences of characters
- image:
bitmaps in various forms, e.g., gif, jpeg, tiff, png
- audio:
sounds in various forms
- video:
animations
- message,
multipart:
special purpose
- application:
catch-all
MIME subtype
- Standard registry: "image/tiff",
"application/postscript"
- New registry rules recently
approved
- "application/vnd.ms-word"
MIME Text: Characters
- may have "charset"
parameter
- charset determines both Character
Encoding Scheme and Repertoire
- text/html
issues in Domain 3 (Authoring)
Charset issues
- Cannot standardize on "Unicode"
- Local applications will want
national encodings
- Han Unification, other political
difficulties
Primary issue
- Standardize when possible (ISO 10646)
- Label when you can't (use
MIME charset registration)
- Don't make recipients guess
MIME Content-Language
- Uses standard codes for identifying
(primary) language of content
- Completely optional
Standards for network protocols
- Electronic Mail (SMTP)
- Web Browsing (HTTP)
- Broadcast communication (NNTP)
- and more..
- directory access (LDAP)
- interactive sessions (TELNET)
-
.
HyperText Transfer Protocol (HTTP)
- Started as a simple protocol, designed for the
1990 vision of the World Wide Web
- http://widget.com/product.html
- Open connection to widget.com
- send "GET
/product.html"
- read headers
- read body
- close connection
HTTP Improvements
- Performance
- Reliability
- Caching
- Persistent connections
- Content negotiation
Simple content negotiation in HTTP
Transparent negotiation in HTTP
Dimensions of negotiation
- Language (Accept-Language)
- Character set (Accept-Charset)
- Capabilities to handle media
(Accept)
- Brand of software (User-Agent)
Issues for HTTP Internationalization and Multilingualism
- deployment
- overhead of negotiation
- interaction with authoring,
caching
Identifiers in the Web
- URL: locations
- New York Public Library, second
floor, third aisle, second shelf, third book from left
- URN: location-independent
names
- QP:475.L95; ISBN:0-19-854529-0
- URC: descriptions
- genre: book, title: The Ecology
of Vision;
author: J.N.Lythgoe; Date: 1979;
Publisher: Clarendon Press, Oxford
URL Requirements
- An object that describes the location of a resource
- Global scope
- parsable
- transportable in many contexts
- extensible
- not loaded with other information
URN Requirements
- global scope
- persistent
- scalable
URC: Uniform Resource Characteristics
- Syntax for carrying metadata
- A standard set of tags useful
for describing Internet resources
Some unsolved problems
- Internationalization (M. Dürst)
- things go away
- pimples.com
- Apple Computer and Apple Music
- conflicts over short names
- urn:hdl:MTV/I_quit
- how does authority migrate?
Other protocols in the Web
- Access control and ratings
- Rating of entertainment content
for adult themes
- How to deal with cultural
differences
- Multiple rating services
Summary: Internationalization and Multilingualism in Web Standards
- Content: good progress
- Protocols: are they enough?
- Naming: is there a solution?
- Standards lead deployment