2/17/00 ...more tagged data manipulation problems

Edgar Allen era at sky.net
Fri Feb 18 19:27:13 CST 2000


Gorden-Ozgul, Patricia E writes:
>
>This solution is great  (and I will test it)  however, the data I am
>currently working with is test data.   Test data varies by source, so
>structure varies.  I must establish programmatic ways to do the text
>manipulation.  The actual (non test) data files will be huge; therefore, I
>don't think I want to use vi and manually manipulate the stuff.  I'd prefer
>using one of the utilities or a shell.
>
'sed' can be tormented into doing complicated things which can be
done more easily by other tools, like 'vi' or its line oriented
counterpart 'ex'.

In UnixLand almost everything is a file.

The shell that you type characters to is listening to /dev/tty?

Most Unix utilities deal with file descriptors stdin, stdout, and
stderr.  Those are set up pointing to your /dev/tty? by the 'login'
process.  Any file can be substituted later.  When you do:

	sed -e G single.in > double.out

You are redirecting stdout to the file 'double.out' instead of '/dev/tty?'.

You can do the same thing with stdin by using '<' or '<<'.  The '<<' is the
one I am about to use. It redirects stdin to be the shell script which is
being executed now till it finds a line consisting only of, in this case,
'EndExCmds'.  Those commands are seen by 'ex' as though you were typing them
in from a console.

Save the following lines into 'JoinAndPipe.sh' (cut-n-paste is OK) and then
	chmod +x JoinAndPipe.sh
	./JoinAndPipe.sh UntaggedFile
	head UntaggedFile

You will be pleasantly surprised.  You can use 'ex', which is the 'colon-mode' of 'vi'.

------< begin JoinAndPipe.sh >-------
#!/bin/sh
ex $1 <<EndExCmds
g/^  /-,.j
%s/.  *htm/.htm/
%s/^<.*/&|/
w
q
EndExCmds
------< end JoinAndPipe.sh >-------

You can change the 'ex' commands to transform other ways and save it to
'Data2html.sh' or whatever.

Transforming text files was the first thing Unix was used for and making it
scriptable was early on the list of features and remains at the heart of UnixLand.




More information about the Kclug mailing list