PGNutils.Txt - 8 SEP 2006 - by Tom McCormick - mccormit@sbcglobal.net ----- YOU MAY WISH TO PRINT THIS FILE FOR FUTURE REFERENCE -------- Many of the freeware utility programs described below have been developed under the Windows XP "command window" which emulates MS DOS. This is reached by clicking on Start, then on Run, then key in CMD.EXE (for XP) or COMMAND.COM (for Windows 9x or ME) and press the Enter key. Key in EXIT to return to Windows. I expect that these programs would run with ANY version of Windows, but I have not tested them with all other versions. All of these programs expect an input file to use Carriage-Return and Linefeed character pairs as line delimiters. PGN from Unix or Linux systems should be processed first using the crlf.exe utility to convert the single-character "newline" to CR/LF before processing. All of these utilities will accept ANY size input file, however, when running PGNTRIM5 or PGNTRIM6 to normalize fresh pgn files, individual PGN GAMES larger than 8,000 characters will be sent into the .BAD file for review, and will not be in the new output file. This is done to permit the user to review each rejected game and decide to manually edit the problem, or discard that game. Any PGN game containing a [FEN tag will be passed through to the output file without any normalizing or correcting. IMPORTANT... Run PGNTRIM5 or PGNTRIM6 before running any of the other PGN utilities described at the end of this document. PGNTRIM5 and PGNTRIM6 --------------------- The comments about PGNTRIM5 apply to PGNTRIM6, a later version. PGNTRIM6 has only a few differences with PGNTRIM5...notably that semicolons are permitted within tags rather than treating them according to the PGN standard as signalling the beginning of a comment through the end of that line/record. PGNTRIM5.EXE is a freeware Windows utility to correct most PGN syntax errors, and to direct games which need human review into a separate output file named BADTRIM5.BAD. That file usually furnishes enough information to the user so that a decision can be made to correct the input file and run again, or to accept the number of games rejected from the new output file. PGNTRIM5 never changes the original input file. PGNTRIM5.EXE requires no installation routine, nor any .DLL file(s). It runs from a Microsoft DOS prompt, or from a Microsoft Windows 95, 98, ME, NT, NT2000, or XP command prompt, and probably under VISTA. PGNTRIM5 does NOT detect illegal/impossible moves; but PGNSCID.EXE is freeware, and will catch most of these. I run pgntrim5 first against newly downloaded PGN files in order to clean up common syntax problems and ommisions, and to drop text info such as titles and crosstables. Then I run the output file into pgnscid to catch any illegal moves. This approach greatly reduces the amount of time needed to edit the PGN file for syntax errors before placing it into a database. This reduces the amount of manual review and editing necessary to rescue important games, or discard others from a file to end up with a much cleaner PGN file for viewing, or for insertion into a database such as SCID, CHESSBASE, FRITZ, or BOOKUP Lite. PGNTRIM5 will "repair" correctable PGN syntax errors such as cd4 which is changed to cxd4 and e8Q to e8=Q and f1 Q + which becomes f1=Q+ etc. You may run the accompanying TEST.PGN file to see what PGNTRIMn will do, i.e., PGNTRIM6 TEST.PGN TESTOUT.PGN.... if you do not enter the filenames in the command tail, you will simply be prompted for them as the program begins. (P)ortable (G)ame (N)otation format is rather thoroughly defined and effective as a means to record and distribute recordings of chess game moves. This standard is available over the internet from several sources. Recently, people have been submitting "annofritzed" PGN games to internet websites. These often reach more than 8,000 characters of movestext...and all too often contain unbalanced alternate move tokens (..)..) for which nesting IS permitted, or they may contain unbalanced curly brace tokens {..} delimiting comments. Fritz will produce correct PGN syntax when autofritzing, but humans seem driven to "improve or clarify" these comments and they frequently end up with these tokens unbalanced. The PGN standard forbids nested {..{...}..} curly braces, anyway. A common error made by players trying to enhance comments or alternate moves, is to use a semicolon ";". The PGN standard requires that ALL text following it in that input record be dropped as comments. If that occurs within {..} or within (...) then the closing character is dropped causing unbalance. Ahem...a STANDARD is a STANDARD, thank you. Recently, PGN games have been appearing on the internet which are Fischer-Random games. If there is no FEN statement, or other indication of this, then "illegal moves" such as 1.Nb3 will pass through syntax checking, but will appear to be an illegal move to database programs! Some standard indication of Fisher-Random games is being debated, and needs to be added to the PGN standard. Until then, PGNTRIM5 or 6 will not recognize a Fisher Random tag such as [Varient "Fischerandom"] until the PGN standard is final. Chess magazines and books are not immune from typograhical errors and omissions such as leaving out moves entirely, leaving pieces off the diagrams, having two black Kings, no White King, displaying entirely the wrong diagram, etc. Persons collecting PGN chess game records do not want to end up with such problems that show up while a game is being studied! Normalization programs can detect most PGN problems, fix many, and tell the user about the others so that they can be manually edited, or the game discarded. PGNTRIM5 directs erroneous games into PGNTRIM5.BAD where they can be reviewed and edited separately from the clean output file. If you were to edit and delete some games from PGNTRIM5.BAD...leaving only games which you have `fixed`, then you could simply COPY the cleaned PGNTRIM5.BAD file to the clean output file as for example: copy newfile.pgn+pgntrim5.bad (NOTE no spaces in the file list! and you may prefer to drop the [Warning tag from the corrected game). A PGN game recording example follows. Heading records are called "tags", and seven of them are required as a minimum....the first 7 shown below are required in any PGN game. Other tags are optional such as the "Opening" and "ECO" tags shown. All tags must conform to standard in order to be useful to a wide audience...Each tag must begin with [ and end with ], and the tag name must begin with one uppercase letter, the text must be enclosed within quotation marks, etc. It is somewhat surprising just how many PGN games have simple syntax errors in the tag records! NOTE: PGNTRIM5 can be forced to retain all tags into the output file by adding /alltags to the command tail, otherwise only the following tags are preserved: [Event [Site [Date [Round [White [Black [Result [ECO [Opening [WhiteElo [BlackElo and [Comment Stripping the [Annotator, [PlyCount, [Clock, etc. etc. saves considerable file space, but if you MUST have them all then always include /alltags in the command line for example pgntrim5 2006WCC.PGN 2006WCC.TRM /alltags By default, there will be exactly four complete moves per line in the output file unless you specify between one and seven moves. One move per line is useful in teaching situations where you want the students to comment on each move (in writing). Four moves per line permits printing with a decent sized font without overflows. These are specified in the command tail, i.e., PGNTRIM6 OLDFILE.PGN NEWFILE.PGN /MPL:1 etc. Normalization programs detect deviations from standard, and either fix the problem, notify the user, or both. Missing tags, illegal moves or incomplete moves such as B7, a8, or Rx can not be fixed and are simply reported to the user for editing or discarding the game. Other problems such as spacing errors can usually be fixed by a normalization program, so Nxg4Nbd7 (no space between White and Black halfmoves) can be fixed to Nxg4 Nbd7, and O-O5. can be fixed to O-O 5. Castling must use alpha O, not zeroes, a normalization program can easily substitute to fix this..as PGNTRIMn does. Missing half-moves or entire moves can be detected and reported, as can a result code which does not match the [Result tag. [Event "Example PGN Chess Game Record"] [Site "Moscow"] [Date "2003.12.25"] [Round "2"] [White "Blaganov"] [Black "Dufus"] [Result "1-0"] [Opening "Scandinavian"] [ECO "B01"] 1.e4 d5 2.exd5 Qxd5 3.Nc3 Qd8 4.d4 Nf6 {B01 Scandinavian} 5.Bc4 c6 6.Nf3 Bg4 7.Bxf7 Kxf7 8.Ne5 Kg8 9.Nxg4 Nbd7 10.Qe2 Nxg4 11.Qe6# 0-1 EXAMPLE GAMES NORMALIZED USING PGNTRIM5 --------------------------------------- Here is an example "BEFORE" and "AFTER" using PGNTRIM5. This very old game was annotated by the computer program Fritz 6...a process called annofritzing. There are many comments within curly braces {...}, NAG comments... $17, move continations following an alternate move sequence, and there are many nested alternate moves i.e., ( ( ( ) ) ) [Event "New Orleans"] [Site "New Orleans"] [Date "1849.??.??"] [Round "?"] [White "Morphy, Paul "] [Black "J. MacConnell sr"] [Result "1-0"] [Annotator "Fritz 6 (6s)"] [PlyCount "57"] [EventDate "1849.??.??"] 1. e4 {C39: King's Gambit Accepted: 3 Nf3 g5 4 h4} e5 2. f4 exf4 3. Nf3 g5 4. h4 g4 5. Ne5 h5 6. Bc4 Rh7 7. d4 d6 8. Nd3 f3 9. g3 (9. gxf3 Be7 10. Be3 Bxh4+ 11. Kd2 Bg5 12. f4 Bf6 13. a3 c6 14. Nc3 Bh8 15. f5 Ne7 16. Qe2 Kf8 17. f6 Bxf6 18. Raf1 d5 19. Rxf6 dxc4 20. Ne5 Nd7 21. Nxd7+ Bxd7 22. Rh6 Rg7 23. R6xh5 Ng8 {Pektor,A-Zvara,P/Prague 1992/0-1 (48)}) 9... Nc6 10. Nf4 $146 (10. c3 Nge7 ( 10... Nce7 11. Kf2 c6 12. Nf4 Qc7 13. Qb3 b5 14. Bd3 Rh8 15. Re1 Ng6 16. Nxg6 fxg6 17. e5 Ne7 18. Bxg6+ Kd8 19. Qf7 Nxg6 20. Qxg6 Qg7 21. Bg5+ Kc7 22. exd6+ Kb6 23. Bd8+ Ka6 24. Qxg7 Bxg7 25. Bc7 { Abbe de Lionne & Morant-Maubisson & Auzout/Paris 1680/1-0 (40)}) 11. Nf4 a6 12. a4 Bg7 13. Qb3 Bh8 14. Nxh5 Kf8 15. Nf4 Na5 16. Qa2 Nxc4 17. Qxc4 c6 18. Nd2 d5 19. exd5 cxd5 20. Qb4 Bf6 21. Nf1 Kg7 22. h5 Nc6 23. Qc5 Be6 24. Qa3 Qd7 { Jannisson & Maubisson-Lionne & Morant/Paris 1680 (36)}) (10. Bb5 d5 11. Ne5 Bd7 12. Nxd7 Qxd7 $17 (12... Kxd7 $2 13. exd5 Bd6 14. Kf2 $18 (14. dxc6+ $6 bxc6 15. Ba4 Bxg3+ 16. Kf1 Rb8 $16 (16... Bxh4 $4 { taking the pawn will bring Black grief} 17. Qd3 $18)))) 10... Bd7 (10... Nf6 11. Nc3 $17) 11. Nc3 Nf6 (11... Bg7 12. Be3 $17) 12. Be3 Ne7 (12... Bh6 13. Rf1 $17) 13. Kf2 c6 (13... Bh6 14. e5 dxe5 15. dxe5 $17) 14. Re1 Bg7 15. e5 dxe5 16. dxe5 Nfd5 (16... Nfg8 17. Ne4 Bxe5 18. Ng5 Bxf4 19. Nxh7 Bxe3+ 20. Rxe3 $11 ) 17. Bxd5 (17. Nfxd5 Nxd5 18. Nxd5 cxd5 19. Qxd5 Bh8 $14) 17... cxd5 (17... Nxd5 18. Ncxd5 cxd5 19. Nxd5 Be6 $14 (19... Bxe5 { Black again will not be able to digest the pawn} 20. Bg5 f6 21. Nxf6+ Kf7 22. Rxe5 (22. Qxd7+ $6 {is not possible} Qxd7 23. Nxd7 Bd4+ 24. Kf1 Kg6 $18) 22... Qb6+ 23. Re3 $18)) 18. Bc5 (18. Ncxd5 Nxd5 (18... Bxe5 $2 { is nothing because of} 19. Bb6 Qb8 20. Bd4 $18) 19. Nxd5 Be6 $14 (19... Bxe5 { as before the pawn must remain untouched} 20. Bg5 f6 21. Nxf6+ Kf7 22. Rxe5 Qb6+ 23. Re3 $18)) 18... Bc6 (18... Rc8 19. Bxa7 Qa5 20. Bd4 $15) 19. b4 (19. Qd3 Rh6 $11) 19... b6 (19... d4 $142 20. Qd3 Rh6 $15 (20... dxc3 21. Qxh7 Kf8 22. Rad1 $18 (22. Qxh5 $6 {is the less attractive alternative} Qd2+ 23. Kg1 Kg8 $18) (22. Nxh5 $4 {the pawn is indigestible} Qd2+ 23. Re2 Qxe2+ 24. Kg1 Qg2#))) 20. Bxe7 $14 Qxe7 {The isolani on e5 becomes a target} 21. Nfxd5 Qb7 $4 (21... Bxd5 $142 {is just about the only chance} 22. Nxd5 Qd8 23. Nf6+ Bxf6 24. exf6+ Kf8 25. Qxd8+ Rxd8 $16) 22. Nf6+ $18 Bxf6 23. exf6+ Kf8 24. Qd6+ Kg8 25. Re7 Qc8 26. Rc7 Qf5 27. Qxc6 {Threatening mate: Qxa8} Qxc2+ (27... Rf8 { does not save the day} 28. Nd5 Qe5 $18) 28. Ke3 Rd8 (28... Rf8 29. Rxa7 Qb2 30. Ra8 Qxc3+ 31. Qxc3 Rxa8 32. Qc7 $18) 29. Rd1 $1 { the end of the story. Threatening mate... how?} (29. Rd1 Rf8 30. Rxa7 $18) 1-0 ...after processing the above file through PGNTRIMn, it appears as [Event "New Orleans"] [Site "New Orleans"] [Date "1849.??.??"] [Round "?"] [White "Morphy, Paul "] [Black "J. MacConnell sr"] [Result "1-0"] [Annotator "Fritz 6 6s "] [PlyCount "57"] [EventDate "1849.??.??"] 1.e4 e5 2.f4 exf4 3.Nf3 g5 4.h4 g4 5.Ne5 h5 6.Bc4 Rh7 7.d4 d6 8.Nd3 f3 9.g3 Nc6 10.Nf4 Bd7 11.Nc3 Nf6 12.Be3 Ne7 13.Kf2 c6 14.Re1 Bg7 15.e5 dxe5 16.dxe5 Nfd5 17.Bxd5 cxd5 18.Bc5 Bc6 19.b4 b6 20.Bxe7 Qxe7 21.Nfxd5 Qb7 22.Nf6+ Bxf6 23.exf6+ Kf8 24.Qd6+ Kg8 25.Re7 Qc8 26.Rc7 Qf5 27.Qxc6 Qxc2+ 28.Ke3 Rd8 29.Rd1 1-0 --------------------------------------------------------- Here is an example "BEFORE" and "AFTER" using PGNTRIM5. This game was annotated by the computer program Fritz8. There are many {[%emt 0:00:00]} elapsed-time remarks which unfortunately use sqare braces within the movestext!! Although these are also within curly brace pairs, using [..] square braces within the moves text area is a violation of common PGN good practice, if not the standard, itself. PGNTRIM5 will remove these as shown in the example below. [Event "Fritz8 commentary removal test file"] [Site "Howie in the Hills, Florida"] [Date "2004.05.28"] [Round "?"] [White "Fritz 8"] [Black "McGillicuddy, Sean"] [Result "1-0"] [ECO "B06"] [PlyCount "75"] [Comment "Unfortunately, Fritz 8 also uses funky comment spacing" {286MB, Fritz8.ctg, Intel 2.5 WinXP } 1. Nf3 {[%emt 0:00:00]} g6 { [%emt 0:00:00]} 2. e4 {[%emt 0:00:00]} Bg7 {[%emt 0:00:03]} 3. d4 { [%emt 0:00:00]} d6 {[%emt 0:00:04]} 4. Nc3 {[%emt 0:00:00]} Nc6 {[%emt 0:00:12] } 5. Bb5 {[%emt 0:00:01]} Bd7 {[%emt 0:00:02]} 6. O-O {[%emt 0:00:02]} a6 { [%emt 0:00:05]} 7. Be2 {[%emt 0:00:01]} Bg4 {[%emt 0:00:17]} 8. Be3 { [%emt 0:00:01]} Nf6 {[%emt 0:00:10]} 9. h3 {[%emt 0:00:02]} Bd7 {[%emt 0:00:04] } 10. Qc1 {[%emt 0:00:01]} O-O {[%emt 0:00:25]} 11. Qb1 {[%emt 0:00:02]} e5 { [%emt 0:00:23]} 12. dxe5 {[%emt 0:00:02]} dxe5 {[%emt 0:00:14]} 13. Kh1 { [%emt 0:00:01]} Re8 {[%emt 0:00:14]} 14. a3 {[%emt 0:00:01]} b5 {[%emt 0:00:20] } 15. Bc5 {[%emt 0:00:02]} Be6 {[%emt 0:00:12]} 16. Qc1 {[%emt 0:00:02]} Qc8 { [%emt 0:00:10]} 17. Qd2 {[%emt 0:00:02]} Bxh3 {[%emt 0:00:19]} 18. gxh3 { [%emt 0:00:05]} Qxh3+ {[%emt 0:00:02]} 19. Nh2 {[%emt 0:00:00]} Nd4 { [%emt 0:00:14]} 20. Rfd1 {[%emt 0:00:04]} Rad8 {[%emt 0:00:09]} 21. Qd3 { [%emt 0:00:03]} Qc8 {[%emt 0:00:37]} 22. b4 {[%emt 0:00:03]} h5 {[%emt 0:00:10] } 23. Rac1 {[%emt 0:00:03]} Bh6 {[%emt 0:00:07]} 24. Rb1 {[%emt 0:00:04]} Bf4 { [%emt 0:00:14]} 25. Bf1 {[%emt 0:00:02]} Kg7 {[%emt 0:00:10]} 26. a4 { [%emt 0:00:05]} c6 {[%emt 0:00:06]} 27. Bg2 {[%emt 0:00:02]} Rh8 { [%emt 0:00:31]} 28. Nf3 {[%emt 0:00:07]} h4 {[%emt 0:00:08]} 29. Ne2 { [%emt 0:00:05]} h3 {[%emt 0:00:04]} 30. Nfxd4 {[%emt 0:00:02]} exd4 { [%emt 0:00:09]} 31. Bf3 {[%emt 0:00:04]} Ng4 {[%emt 0:00:07]} 32. Bxg4 { [%emt 0:00:02]} Qxg4 {[%emt 0:00:09]} 33. Rg1 {[%emt 0:00:04]} Qh4 { [%emt 0:00:38]} 34. Bxd4+ {[%emt 0:00:09]} Kg8 {[%emt 0:00:19]} 35. Rbf1 { [%emt 0:00:06]} Rh6 {[%emt 0:00:24]} 36. Ng3 {[%emt 0:00:03]} h2 { [%emt 0:00:56]} 37. Rg2 {[%emt 0:00:02]} Be5 {[%emt 0:00:17]} 38. Nf5 { [%emt 0:00:02]} 1-0 [Event "Fritz8 commentary removal test file"] [Site "Howie in the Hills, Florida"] [Date "2004.05.28"] [Round "?"] [White "Fritz 8"] [Black "McGillicuddy, Sean"] [Result "1-0"] [ECO "B06"] [PlyCount "75"] [Comment "Unfortunately, Fritz 8 also uses funky comment spacing" 1.Nf3 g6 2.e4 Bg7 3.d4 d6 4.Nc3 Nc6 5.Bb5 Bd7 6.O-O a6 7.Be2 Bg4 8.Be3 Nf6 9.h3 Bd7 10.Qc1 O-O 11.Qb1 e5 12.dxe5 dxe5 13.Kh1 Re8 14.a3 b5 15.Bc5 Be6 16.Qc1 Qc8 17.Qd2 Bxh3 18.gxh3 Qxh3+ 19.Nh2 Nd4 20.Rfd1 Rad8 21.Qd3 Qc8 22.b4 h5 23.Rac1 Bh6 24.Rb1 Bf4 25.Bf1 Kg7 26.a4 c6 27.Bg2 Rh8 28.Nf3 h4 29.Ne2 h3 30.Nfxd4 exd4 31.Bf3 Ng4 32.Bxg4 Qxg4 33.Rg1 Qh4 34.Bxd4+ Kg8 35.Rbf1 Rh6 36.Ng3 h2 37.Rg2 Be5 38.Nf5 1-0 OTHER FREEWARE PGN UTILITIES ---------------------------- PGN2ONE ------- PGN2ONE reads normalized PGN and creates one-record-per-game and prepends a 40 character sort key which can be used to sort by White, Black, ECO, Number of moves in game, Year of game, etc. Some batch files have been included in PGNutils.zip to use qsort and perform each of the above sorts. I refer to this output format as .111 format indicating one line/record per game. The prepended sort/selection record area provides exactly consistent locations for important data needed to sort and select games. This prepended area is removed when the .111 format is converted back to PGN by either ONE2PGN or PGNUNDUP. The first 7 letters of player names is optimum because it reduces misspellings. Before you challange this approach, look up the word OPTIMUM. It would also be rather awkward to obtain fixed positions for full names such as Leko, Nimzowitsch, etc. Here are some examples of one-record-per-game created by PGN2ONE: ...you can see how easy it is to select or sort by critical elements. 1 5 10 15 20 25 30 35 40 ...see BYYEAR.BAT etc. examples. White Black Year Mvs Re Site ECO { Adams Kasparo 1992 022 0-1 Dor D31} [Event "?"][Site "Dortmund"]... { Anand Kasparo 1998 024 1/2 Lin B55} [Event "It "][Site "Linares "]... { Bareev Kasparo 1999 021 1/2 Sar D80} [Event "It "][Site "Sarajevo "]... { Beliavs Kasparo 1979 035 1-0 Min A61} [Event "?"][Site "Minsk"]... { Karpov Kasparo 1996 045 1/2 Las D20} [Event "It "][Site "Las Palmas "]... { Kasparo Anand 1999 033 1-0 Wij A45} [Event "Blitz "][Site "Wijk aan Zee "] { Kasparo Huebner 1992 048 0-1 Col C23} [Event "?"][Site "Cologne"]... { Kasparo Ivanchu 1999 036 1-0 Lin D11} [Event "It "][Site "Linares "]... Creating one record per game in this way also facilitates the use of the FIND command to select or reject games containing certain text strings. FIND comes will all versions of DOS or Windows. For example: find "O-O-O" MyBig.111 >CastLong.111 {the > redirects output to a new file instead of the display.} or find /V "O-O" MyBig.111 >NoCastl.111 {selects only games in which neither player castles.} {the "/V" command parameter OMITS matched records.} or find "C02" MyBig.111 >FrAdvan.111 {this outputs games of French Defense, Advance var.} or find "2004" MyBig.111 >2004Only.111 {this outputs only games played during year 2004.} or find /V "1/2" MyBig.111 >NoDraws.111 {drops drawn games of any length.} Two or more `find` executions can be used to refine selections further: EXAMPLE 1 --------- find "Leko" MyBig.111 >Leko2.111 {this outputs games played by Leko as White or Black.} then find "Kramnik" Leko2.111 >LekoKram.111 {this outputs games played between Leko & Kramnik.} EXAMPLE 2 --------- find "0-1" MyBig.111 >BlackWin.111 {drops draws and incomplete games.} {drops games shorter than 20 moves.} find "1-0" MyBig.111 >WhiteWin.111 {drops draws and incomplete games.} {drops games shorter than 20 moves.} copy BlackWin.111+WhiteWin.111 WinsOnly.111 find " 26." /V WinsOnly.111 >Miniat25.111 {outputs games shorter than 26 moves.} ONE2PGN ------- ONE2PGN reads the one-record-per-game file (any sequence) created using PGN2ONE, and outputs a new PGN file. If the one-record-per-game file as been sorted on positions 1 to 40 with the intention of dropping duplicate games, then PGNUNDUP should be used instead of ONE2PGN in order to drop duplicate games. ONE2PGN will NEVER drop a game, even if it is a duplicate game...and therefore the input file sequence to ONE2PGN is of no concern...you MAY or MAY NOT sort it into any sequence as you wish. or...... PGNUNDUP reads the sorted output from PGN2ONE, and creates normal PGN from the incoming 1-record-per-game. The input file is expected to be in ascending sequence on positions 1 to 40 so that duplicate games can be detected and dropped. The input file MUST be in that sequence to use this program, else it will tell you that the input file is "out of sequence". PGNBEST6 -------- PGNbest6 reads normal PGN, looks in a plain text table (provided) for the 3,000 or so greatest player names of all time, and outputs games if EITHER player is on that list. Kasparov vs Amatuer will be written to the new output file, but NoName vs. Amatuer will not. The PGNbest6.RAT plain text ratings file (user modifiable) should be in the same folder as the .exe file. This file may be in ANY sequence, but I find alphabetical by name easier to update! TIPS ON USING THESE UTILITIES ----------------------------- Always run PGNTRIM5 first to normalize the PGN syntax. Example: pgntrim5 05Linar.pgn 05Linar2.pgn Example: pgntrim5 05Linar.pgn 05Linar2.pgn /MPL:5 NOTE: /MPL:n where n is 1 to 7 sets moves per line in output} By using PGN2ONE.exe, you create one line (record) per game, and prepend sort fields (columns) to it. You make sorting or selecting much simpler since several key sort items are in fixed positions! Utility programs are provided to convert PGN to one-line-per-game and back again after sorting or selecting has been done. For example, here are a few records illustrating this format: White Black Year Moves Result Site ECO { Kramnik Kasparo 2003 018 1/2 Lin D11} [Event "XX SuperGM"][Site "Lin... { Radjabo Leko 2003 046 0-1 Lin E12} [Event "XX SuperGM"][Site "Lin... { Anand Ponomar 2003 064 1-0 Lin C65} [Event "XX SuperGM"][Site "Lin... { Vallejo Anand 2003 030 1/2 Lin A30} [Event "XX SuperGM"][Site "Lin... { Kasparo Radjabo 2003 039 0-1 Lin C11} [Event "XX SuperGM"][Site "Lin... { Ponomar Kramnik 2003 040 0-1 Lin B30} [Event "XX SuperGM"][Site "Lin... { Radjabo Ponomar 2003 011 1/2 Lin D30} [Event "XX SuperGM"][Site "Lin... { Kramnik Vallejo 2003 030 1/2 Lin D15} [Event "XX SuperGM"][Site "Lin... { Leko Kasparo 2003 087 1/2 Lin B55} [Event "XX SuperGM"][Site "Lin... { Bacrot Adams 2003 045 1/2 Rey A45} [Event "Hrokurinn"][Site "Reyk... The data between the {...} are sort keys. You may use ByECO.BAT to sort such a file by ECO opening code as shown above. You may use ByYear.bat, etc. for other sequences. The command-line "find" utility which comes with DOS and Windows may be used against this format very handily since selections will be entire games...ready for ONE2PGN to restore to PGN. Example: pgn2one 03twic.pgn 03twic.111 find "Karpov" 03TWIC.111 > 03Karpov.111 one2pgn 03Karpov.111 03Karpov.pgn To combine many PGN files, the copy command will suffice. Example: copy 02linar.pgn+03linar.pgn+04linar.pgn 0204lin.pgn {NOTE No spaces in the multiple file names} {New output} Example: copy *.pgn Feb25.pgn --------------------------------------------------------------------- Additional (and somewhat repetitive) comments about these utilities. PGN files from Unix or Linux based computers use a single newline charater to terminate each line. Windows PCs require a pair of characters for this, and the utility program named crlf.exe will `fix` PGN files from Unix/Linux sources to work properly on Windows PCs. PGNTRIM5.exe should then be used to normalize the combined PGN file as follows: pgntrim5 RawPGN.pgn Normal.pgn When combining several PGN files into one PGN file (i.e., COPY 2006.PGN+06*.PGN) you may run across some files that originated on a Unix/Linux computer and therefore have only a linefeed separator for lines (called a newline, or /n). For Windows PCs, you need to run the combined file through CRLF.EXE to insure that every line is terminated by a CR/LF character pair as Windows expects. PGNTRIM5, PGN2ECO3, PGN2ONE, ONE2PGN, PGNUNDUP, and PGNBEST6 all require a new output filename, and DO NOT change the original file. QSORT and CRLF require only one filename, and it becomes changed... so make a backup file first if you are worried. Games rejected by PGNTRIM5 will appear in a file named BADTRIM5.BAD where they can be reviewed, and then edited or discarded. For example, games having zero or one move are rejected, games with nested curly-brace comments such as {23.Qb6 was better {then if 23...Ka8....}} etc. are contrary to the PGN standard. Nested alternate moves within parentheses are proper, and are handled by pgntrim5 unless they are 'unbalanced' ...in that case they are also sent to the .BAD file. The ...BAD file continues to grow and grow until you delete it, then a new one will be created when needed. PGNTRIM5 fixes many common syntax errors and omissions so that the output file conforms very closely to the `export format` for PGN. Illegal or impossible moves are detected later when the normalized PGN files are imported into a database (as with pgnscid, for example). PGN2ECO3 may optionally be run then to assign ECO codes using the first 4 moves compared to the PGN2ECO3.eco plain text table which is provided. Be careful modifying this .eco file since the sequence AND completeness are important. PGN2ECO3 will use the last match it found in pgn2eco3.eco as it moves down the list. The sequence and content of PGN2ECO3.eco is critical! An [Opening tag for each game will also be inserted into the PGN output file if no such tag existed. For help, enter PGN2ECO3 /? PGN2ONE.exe converts normalized PGN format files to one-record- per-game plus a 40-char prefix of useful sort `fields` (or columns). QSORT or any such program can then be used to insure the sequence of the .111 format file. (.111 is my convention, any filetype may be used). If the file is sorted from position 1 through 40, then duplicates may be dropped if PGNUNTAG.exe is then executed. An example command to drop duplicates from a sorted .111 file is: pgnuntag amber.111 amber.pgn The one-record-per-game format has many useful functions: 1. It is simpler to select or drop large sets of games as for example dropping all draws of less than nn moves, selecting only games from one or several specific years. The `find` command, or a custom-written program is useful for all this. 2. The games can easily be sorted by White, Black, ECO, number of moves in the game, result (1-0,0-1, etc.), or year. 3. When sorted on positions 1 to 40, and used as input to PGNUNDUP.exe, duplicate games are dropped as the new output file is written. 4. Games having desired characteristics such as neither player castling can be easily selected. Likewise for castling long, checkmate, or certain variations within an ECO opening code. If duplicates need not be dropped, you may use ONE2PGN.exe to convert the .111 format back to PGN ...as with one2pgn Leko.111 Leko.pgn Finally, if desired, a PGN file can be reduced to contain only games where both players are rated 2450 or above by using PGNBEST6.exe i.e., pgnbest6 04all.pgn 04best.pgn or pgnbest6 04all.pgn 04best.pgn /ELO which causes [WhiteElo and [BlackElo tags to be updated or created. PGNBEST6.exe uses the plain text ratings file PGNBEST6.rat to select the strongest players. This file need not be in alphabetical, or any other sequence, but may be easier to maintain in alphabetical sequence. One method is to capture new FIDE ratings lists into the .rat format, and keep `classic` masters such as Alekhine, Fisher, etc. at the end where they can simply be copied as one block into a new .rat file. Since the `Classic` players are not gaining new ratings, an estimated rating is used, and would only apply to games in which they played. Up-and-coming new masters (such as Magnus Carlsen) should be added to cause their games to be selected. PGNBEST6.rat uses the first seven characters of player names for selections and matches since this has been found by lengthy testing to be optimum. ----------------------------------------------------------------------- S U M M A R Y ------------- I run pgntrim5 immediately after downloading PGN files from the internet. If I am going to combine two or more PGN files, I do that next using some variant of the "copy" command to achieve the desired result. For example: copy twic48*.PGN+twic49*.pgn+twic50*.pgn twic2004.pgn If I am going to select only the games where both players are very strong, then I run pgnbest6 using the plain-text user-modifiable table file named pgnbest6.rat ----------------------------------------------------------------------- The following files are contained in the archive file PGNUTILS.ZIP 111TOELO.EXE Reads .111 records, adds GM Elo if none present. 111TOELO.RAT Used by above program. Includes classic Grandmasters. 111YEAR.EXE Reads .111 in any sequence, appends .111 to specific year files such as 1854.111, 2001.111, etc. 2600PLUS.RAT Recent ratings of Grandmasters with FIDE rating of 2600+. BYECO.BAT These batch files sort .111 files in different ways.... BYEVENT.BAT BYMOVES.BAT BYRESULT.BAT BYYEAR.BAT BYPLAYER.EXE Reads .111 in any sequence, writes Leko.111, Anand.111, etc. CHOICE.DOC Free utility from Microsoft for making choices in batch files. CHOICE.EXE CRLF.EXE Scans any text input (including PGN), and insures all lines CRLF.TXT ...end in CR/LF rather than just /N (Linefeed). ECOBYDES.TXT Gives ECO code from opening description. ECOBYECO.TXT Gives opening description from ECO code. FILE_ID.DIZ Brief summary of PGNUTILS.ZIP for website file listings. FIXCRLF.EXE Similar to CRLF ICCF.RAT Correspondance chess ratings list ONE2PGN.EXE Convert .111 format in any sequence back into PGN. PGN2ECO3.ECO OPTIONAL: reads and writes PGN assigning ECO only if missing. PGN2ECO3.EXE -------- Caution...these are APPROXIMATE ECO codes, only. PGN2ONE.EXE Converts PGN file to .111 file of one line per game. PGNBEST6.EXE Reads PGN, writes PGN of GMs having FIDE Elo of 2450+ PGNBEST6.RAT PGNSCID.EXE Loader for SCID database will catch illegal or impossible ...moves which PGNTRIM5 misses. PGNTRIM5.DOC This is the primary normalization program for PGN files. PGNTRIM5.EXE Read the .DOC file for further details. PGNUNDUP.EXE Reads SORTED .111, writes PGN while dropping duplicate games. PGNUTILS.TXT This file QSORT.BRF Brief documentation...all you need is in here! QSORT.DOC Full documentation QSORT.EXE Command-line sort utility handles ANY size file. TEST.PGN Important test data exercising PGNTRIM5 & showing functions. UN_EOF.EXE Removes excessive end-of-file characters leaving only one! UN_EOF.TXT