dispatch/vendor/github.com/tdewolff/parse/v2/html/README.md

99 lines
1.9 KiB
Markdown
Raw Normal View History

2019-06-09 00:01:48 +00:00
# HTML [![GoDoc](http://godoc.org/github.com/tdewolff/parse/html?status.svg)](http://godoc.org/github.com/tdewolff/parse/html)
2018-12-17 13:41:24 +00:00
This package is an HTML5 lexer written in [Go][1]. It follows the specification at [The HTML syntax](http://www.w3.org/TR/html5/syntax.html). The lexer takes an io.Reader and converts it into tokens until the EOF.
## Installation
Run the following command
2019-06-09 00:01:48 +00:00
go get -u github.com/tdewolff/parse/v2/html
2018-12-17 13:41:24 +00:00
or add the following import and run project with `go get`
import "github.com/tdewolff/parse/v2/html"
## Lexer
### Usage
The following initializes a new Lexer with io.Reader `r`:
``` go
l := html.NewLexer(r)
```
To tokenize until EOF an error, use:
``` go
for {
tt, data := l.Next()
switch tt {
case html.ErrorToken:
// error or EOF set in l.Err()
return
case html.StartTagToken:
// ...
for {
ttAttr, dataAttr := l.Next()
if ttAttr != html.AttributeToken {
break
}
// ...
}
// ...
}
}
```
All tokens:
``` go
ErrorToken TokenType = iota // extra token when errors occur
CommentToken
DoctypeToken
StartTagToken
StartTagCloseToken
StartTagVoidToken
EndTagToken
AttributeToken
TextToken
```
### Examples
``` go
package main
import (
"os"
"github.com/tdewolff/parse/v2/html"
)
// Tokenize HTML from stdin.
func main() {
l := html.NewLexer(os.Stdin)
for {
tt, data := l.Next()
switch tt {
case html.ErrorToken:
if l.Err() != io.EOF {
fmt.Println("Error on line", l.Line(), ":", l.Err())
}
return
case html.StartTagToken:
fmt.Println("Tag", string(data))
for {
ttAttr, dataAttr := l.Next()
if ttAttr != html.AttributeToken {
break
}
key := dataAttr
val := l.AttrVal()
fmt.Println("Attribute", string(key), "=", string(val))
}
// ...
}
}
}
```
## License
Released under the [MIT license](https://github.com/tdewolff/parse/blob/master/LICENSE.md).
[1]: http://golang.org/ "Go Language"