-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathmarkov3.6
86 lines (86 loc) · 2.44 KB
/
markov3.6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
.\" markov3
.\" @(#)markov3.6 1.1 3/6/87 epimass!jbuck
.TH MARKOV3 6 "3/6/87"
.UC 4
.SH NAME
markov3 \- Digest and spit out quasi-random Usenet articles
.SH SYNOPSIS
.B markov3
[
.B \-pv
] [
.B \-n
.I n_articles
] [
.B \-d
.I dumpfile
] [
.B \-s
.I seed
] [
.B \-x
]
files
.SH DESCRIPTION
.PP
.I Markov3
digests Usenet articles and builds an internal data structure that
models the articles as if they came from a random process, where
each word is determined by the previous two. It then emits a series
of articles on the standard output that have the same distribution
of words, word pairs, and word triplets as do the input files.
The name
.I markov3
comes from the fact that this structure is called a Markov chain,
and that the statistics for word triplets are modeled.
Here, a "word" is a sequence of printable characters surrounded by
whitespace. Paragraph breaks (blank lines) are also treated as a
"word". Paragraphs of included text are treated as single "words"
and printed as "> ...".
.PP
By default, the program expects to be fed Usenet articles; it strips
off headers, included text, and signatures (or at least it tries).
The
.B \-p
(plain) option disables the header-stripping feature (otherwise
everything is skipped until a blank line is encountered).
.PP
By default, 10 articles, separated by form feeds, are written on the
standard output. The
.B \-n
option lets you specify a different number.
.PP
The
.B \-x
option does not seed the random number generator; this is useful
for simulating people who repeat themselves.
.PP
The
.B \-d
(dump) option dumps a representation of the internal data structure
built by
.I markov3
on the named file.
.PP
Finally, the
.B \-v
(verbose)
option prints some statistics on the standard error.
.SH "CAVEATS"
This program allocates lots of memory if given large amounts of input.
On virtual memory systems, the paging behavior is atrocious because
pointers tend to point every which way, and many pointers are dereferenced
for every word processed. This could be improved, I'm sure.
.PP
Posting articles generated by
.I markov3
to the net may be hazardous to your health.
.PP
Not as smart as Mark V. Shaney.
.SH "PORTABILITY"
An effort has been made to make this program as portable as possible;
an earlier version was much less portable because of problems with
null pointers and rand(3). Please let me know if you have further problems.
.PP
If you don't have lex, you'll need to rewrite the lexical analyzer
but most of the program is in C.