Hello everyone.
How do you scan a text file in the following way?
It will have multiple occurrences of a label in sucessive records. For instance:
****newrecord**** *TTTT lorum ipsem *TTTT feugiat, ullamcorper *RRRR suscipit dolor..... *RRRR fuerat inscipior *SSS nunquam alebit ****endofrecord****
This is how it actually is - the labels are all in caps preceded by *, the start and end of the records are indicated as shown, and each line consists of a field label followed by field content. I am trying to make this into a csv file for import into another database, but am having problems because the number of identical field labels varies from record to record. The starting database appears to have allowed the user to create any number of identically labelled fields. So the number of fields in a record varies from 30 to 50, and there can be 5 -10 duplicate field labels in any record. It is several thousand records in size.
The problem is to go through the file, and remove only the second and subsequent occurences of a label, putting the content from the second and subsequent records in the record with the first instance. So in the above example, you apply the process and end up with
*TTTT lorum ipsem feugiat, ullamcorper *RRRR suscipit dolor fuerat inscipior *SSS nunquam alebit
Once the file is like this, it will be simple to go through and change the carriage returns into tabs, and then do the import.
I realise that uniq will find duplicates, and that tr will do replaces, and that you can probably pipe one into the other...or use GAWK? But can't seem to figure out how exactly to make them all do exactly this. My main problem is how to just delete the second and subsequent duplicate labels, instead of either all occurrences of the label, or the whole record with the duplicate label. Any ideas?
Regards & thanks in advance to anyone patient enough to help.
Peter Berrie