Help with multi-line regex parser

Here developers can talk about how to write a Parser for LogMX

Moderator: admin

Post Reply
rodb
Posts: 9
Joined: Tue Mar 10, 2020 12:46 am

Help with multi-line regex parser

Post by rodb »

Hello

I'm evaluating LogMX and I am having trouble with parsing a debug log for Kerio Connect. Sometimes the logs run over two lines, sometimes three, and sometimes four. I can't figure it out with regex and can't find the fields that are supported by Log4j format.

Here is a sample of the log:

Code: Select all

[10/01/2020 12:57:46.718](15328){dbg}{database} In DbServer\DatabaseOperations.cpp:353 (DatabaseOperations::initializeCommonDatabase)
[#1] (B1CB)                                     New common database created, signature is {95F6754B-F809-4854-A847-

[10/01/2020 12:57:46.733](23592){err}{mapi-interface} In StoreProvider\MSProviderImpl.cpp:452 (MSProviderImpl::Logon)
[#1] (common)                                         Exception of class HResultException: StoreProvider\MSProviderImpl.cpp(371), MSProviderImpl::Logon:
                                                      0x8004011c MAPI_E_UNCONFIGURED

[10/01/2020 12:57:50.546](15328){err}{communication} In SCProvider\NtlmAuthenticator.cpp:111 (PocoNtlmAuth::sendResponseGetChallenge)
[#2] (common)                                        NTLM authentication has been unsuccessful
I have this regex parser string which is good for parsing, but if the log line is more than 2 lines then it captures the third and fourth line into the message field and I would really like it to be separated out to it's own column, or even better to show as an emitter.

Code: Select all

\[(\S+ \S+)\]\s?\((\d+?)\)\s?\{(.*?)\}\s?\{(.*?)\}\s?In\s?(.*?)\n([\S\s]*?[^\[]*)
This is all the fields I want to obtain with examples:

Built-in fields

Timestamp: [10/01/2020 12:57:46.733] (not including the square brackets
Level: {err} (not including the curly brackets)
Emitter: {mapi-interface} (not including the curly brackets)
Message: [#1] (common) Exception of class HResultException: StoreProvider\MSProviderImpl.cpp(371), MSProviderImpl::Logon:

Custom fields:

Process ID: (15328) (without the brackets)
Location: In StoreProvider\MSProviderImpl.cpp:452 (without the word 'in' at the beginning)
Error Code: 0x8004011c (these error codes always start with '0x'
Error Type: MAPI_E_UNCONFIGURED (always follows the error code)

Thanks for any help.

Rod
admin
Site Admin
Posts: 556
Joined: Sun Dec 17, 2006 10:30 pm

Re: Help with multi-line regex parser

Post by admin »

Hello Rod,

This log format may be a bit too complex to be tackled through a Regex. It might be possible, but because of the optional third line, the regex would be so long and complicated that it would not be maintainable. Instead, I would recommend using a Parser of type "JavaClass". It also allows you to do more things like some transformations. I wrote a small simple Java class to parse this log format, it seems to work well with the example you gave.

I attached this Parser to this message. To import it in LogMX, go to menu "Tools" > item "Options" > tab "Parsers", then click on the import button (last button on the right), and select the file attached to this message. Don't forget to disable/remove the Regex parser you created, so that LogMX doesn't use it, and uses the new one instead.

kerio-connect-parser.export
(5.17 KiB) Downloaded 1063 times

If you want to modify this Parser, I also included the Java source code in this file (once the Parser is imported, the source file will be located in the "parsers/src/sample/parser" LogMX sub-directory). You can read more about these Parsers here: https://logmx.com/parser-dev

parser-java.png
parser-java.png (26.2 KiB) Viewed 27902 times

Please let me know if you have any question!
Xavier
admin
Site Admin
Posts: 556
Joined: Sun Dec 17, 2006 10:30 pm

Re: Help with multi-line regex parser

Post by admin »

I forgot: for the other forum users who don't want to import the Parser, here is the Java code I used for reference:

Code: Select all

package sample.parser;

import java.text.SimpleDateFormat;
import java.util.Arrays;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import com.lightysoft.logmx.business.ParsedEntry;
import com.lightysoft.logmx.mgr.LogFileParser;

/**
 * Sample LogMX Parser able to parse a log file with multi-line support and Relative Date support.<BR/>
 * Here is an example of log file suitable for this parser:<BR/>
 * 

[10/01/2020 12:57:46.718](15328){dbg}{database} In DbServer\DatabaseOperations.cpp:353 (DatabaseOperations::initializeCommonDatabase)
[#1] (B1CB)                                     New common database created, signature is {95F6754B-F809-4854-A847-

[10/01/2020 12:57:46.733](23592){err}{mapi-interface} In StoreProvider\MSProviderImpl.cpp:452 (MSProviderImpl::Logon)
[#1] (common)                                         Exception of class HResultException: StoreProvider\MSProviderImpl.cpp(371), MSProviderImpl::Logon:
                                                      0x8004011c MAPI_E_UNCONFIGURED

[10/01/2020 12:57:50.546](15328){err}{communication} In SCProvider\NtlmAuthenticator.cpp:111 (PocoNtlmAuth::sendResponseGetChallenge)
[#2] (common)                                        NTLM authentication has been unsuccessful

 */
public class KerioConnectParser extends LogFileParser {
    /** Current parsed log entry */
    private ParsedEntry entry = null;

    /** Entry date formatter */
    private SimpleDateFormat dateFormat = null;

    /** Entry date format string */
    private final String DATE_FORMAT_STRING = "dd/MM/yyyy HH:mm:ss.SSS"; // TODO: is it really "Day/Month" or "Month/Day"?

    /** Mutex to avoid that multiple threads use the same Date formatter at the same time */
    private final Object DATE_FORMATTER_MUTEX = new Object();

    /** Pattern for entry begin */
    private final static Pattern ENTRY_BEGIN_PATTERN = Pattern.compile(
            "^\\[(\\d{2}/\\d{2}/\\d{4} .*?)\\]\\s*?" // Date
                + "\\((.*?)\\)\\s*?" // Process ID
                + "\\{(.*?)\\}\\s*?" // Level
                + "\\{(.*?)\\}\\s*?" // Emitter
                + "In\\s*(.*)" // Message
                + "$");

    /** Pattern for "Error Type+Code" */
    private final static Pattern ERROR_TYPE_AND_CODE_PATTERN = Pattern.compile(
            "^\\s*?(0x\\S+?) (.*)$");

    /** Pattern for a blank line */
    private final static Pattern BLANK_LINE_PATTERN = Pattern.compile("^\\s*$");

    /** Pattern for a multiple spaces */
    private final static Pattern SPACES_PATTERN = Pattern.compile("\\s{2,}");

    /** Buffer for Entry message (improves performance for multi-lines entries)  */
    private StringBuilder entryMsgBuffer = null;

    /** Key of user-defined field "Process ID" */
    private static final String CUSTOM_FIELD_PID = "PID";

    /** Key of user-defined field "Location" */
    private static final String CUSTOM_FIELD_LOCATION = "Location";

    /** Key of user-defined field "Error Code" */
    private static final String CUSTOM_FIELD_ERR_CODE = "ErrCode";

    /** Key of user-defined field "Error Type" */
    private static final String CUSTOM_FIELD_ERR_TYPE = "ErrType";

    /** User-defined fields names */
    private static final List<String> EXTRA_FIELDS_KEYS = Arrays.asList(CUSTOM_FIELD_PID,
            CUSTOM_FIELD_LOCATION, CUSTOM_FIELD_ERR_CODE, CUSTOM_FIELD_ERR_TYPE);


    /** 
     * Returns the name of this parser
     * @see com.lightysoft.logmx.mgr.LogFileParser#getParserName()
     */
    @Override
    public String getParserName() {
        return "Kerio Connect Parser";
    }

    /**
     * Returns the supported file type for this parser
     * @see com.lightysoft.logmx.mgr.LogFileParser#getSupportedFileType()
     */
    @Override
    public String getSupportedFileType() {
        return "Kerio Connect log files";
    }

    /**
     * Process the new line of text read from file 
     * @see com.lightysoft.logmx.mgr.LogFileParser#parseLine(java.lang.String)
     */
    @Override
    protected void parseLine(String line) throws Exception {
        // If end of file, records last entry if necessary, and exits
        if (line == null) {
            recordPreviousEntryIfExists();
            return;
        }

        Matcher matcher = ENTRY_BEGIN_PATTERN.matcher(line);
        if (matcher.matches()) {
            // Record previous found entry if exists, then create a new one
            prepareNewEntry();

            entry.setDate(matcher.group(1));
            entry.getUserDefinedFields().put(CUSTOM_FIELD_PID, matcher.group(2));
            entry.setLevel(matcher.group(3));
            entry.setEmitter(matcher.group(4));
            entry.getUserDefinedFields().put(CUSTOM_FIELD_LOCATION, matcher.group(5));
        } else if (entry != null) {
            matcher = ERROR_TYPE_AND_CODE_PATTERN.matcher(line);
            if (matcher.matches()) {
                entry.getUserDefinedFields().put(CUSTOM_FIELD_ERR_CODE, matcher.group(1));
                entry.getUserDefinedFields().put(CUSTOM_FIELD_ERR_TYPE, matcher.group(2));
            } else if (!BLANK_LINE_PATTERN.matcher(line).matches()) {
                // appends this line to previous entry message
                if (entryMsgBuffer.length() != 0) {
                    entryMsgBuffer.append('\n');
                }
                entryMsgBuffer.append(SPACES_PATTERN.matcher(line).replaceAll(" ")); // remove multiple spaces
            }
        }
    }

    /** 
     * Returns the ordered list of user-defined fields to display (given by their key), for each entry.
     * @see com.lightysoft.logmx.mgr.LogFileParser#getUserDefinedFields()
     */
    @Override
    public List<String> getUserDefinedFields() {
        return EXTRA_FIELDS_KEYS;
    }

    /**
     * Returns a relative Date for the given entry 
     * @see com.lightysoft.logmx.mgr.LogFileParser#getRelativeEntryDate(com.lightysoft.logmx.business.ParsedEntry)
     */
    @Override
    public Date getRelativeEntryDate(ParsedEntry pEntry) throws Exception {
        return getAbsoluteEntryDate(pEntry);
    }

    /**
     * Returns the absolute Date for the given entry 
     * @see com.lightysoft.logmx.mgr.LogFileParser#getAbsoluteEntryDate(com.lightysoft.logmx.business.ParsedEntry)
     */
    @Override
    public Date getAbsoluteEntryDate(ParsedEntry pEntry) throws Exception {
        synchronized (DATE_FORMATTER_MUTEX) { // Java date formatter is not thread-safe
            if (dateFormat == null) {
                // Now create the date formatter using the right Locale
                // (method "getLocale()" can't be called from the constructor) 
                dateFormat = new SimpleDateFormat(DATE_FORMAT_STRING, getLocale());
            }
            return dateFormat.parse(pEntry.getDate());
        }
    }

    /**
     * Send to LogMX the current parsed log entry
     * @throws Exception
     */
    private void recordPreviousEntryIfExists() throws Exception {
        if (entry != null) {
            entry.setMessage(entryMsgBuffer.toString());
            addEntry(entry);
        }
    }

    /**
     * Send to LogMX the current parsed log entry, then create a new one
     * @throws Exception
     */
    private void prepareNewEntry() throws Exception {
        recordPreviousEntryIfExists();
        entry = createNewEntry();
        entryMsgBuffer = new StringBuilder(80);
        entry.setUserDefinedFields(new HashMap<String, Object>(8)); 
    }
}

rodb
Posts: 9
Joined: Tue Mar 10, 2020 12:46 am

Re: Help with multi-line regex parser

Post by rodb »

Thanks so much, this works perfect, and now I have a template to start learning how to make the Java parsers!

best, Rod
rodb
Posts: 9
Joined: Tue Mar 10, 2020 12:46 am

Re: Help with multi-line regex parser

Post by rodb »

Now I have three more questions:

1. How about if the logs can have timestamps done in two different ways? Is it possible to configure a single java parser to cope with this?

Code: Select all

[10/01/2020 12:57:46.733]
[11/Mar/2020 12:08:24]
2. Can I make some fields optional? For example, the errorlevel field is not in all logs. Some customers have this turned on, and some don't.

Code: Select all

[28/01/2020 13:58:02.185][14668] {https} Task 9032 handler BEGIN 
[28/01/2020 13:58:02.185](21392){err}{communication} In SCProvider\Communicator.cpp:291 (Communicator::checkAndLogResult)
[#124] (B1CB)                                        Exception of class HResultException: SCProvider\HttpConnection.cpp(146), HttpConnection::checkStatus:
                                                     0x80042013 KOFF_E_REQUEST_FAILED (Request info: Batch move message store=aberger@eanw.nexter 
3. Finally, can I make the importer cope with log lines that are single line, two lines, three or four? Depending on the log levels turned on in the customer machine, this can vary the number of lines each log takes up. In the above example, the first has one line, the second has four

thank you.
admin
Site Admin
Posts: 556
Joined: Sun Dec 17, 2006 10:30 pm

Re: Help with multi-line regex parser

Post by admin »

Hello,

Yes, basically Java Parsers can do all of that. I made some changes to handle both date formats, but then I realized that in your example, the "Process ID" is now around square brackets [] and not parentheses () anymore. Is it really the case, or was it a typo in your message?

Basically Java Parsers can do everything, but the log format needs to have a minimum of consistency. If all the log entries can have different formats, it may be tricky to write and maintain this Parser. For example, in your example, the string " In " is not present after the date like in your first message, so what is the field for "Task 9032 handler BEGIN"?

Code: Select all

[28/01/2020 13:58:02.185][14668] {https} Task 9032 handler BEGIN
Also, in the last line, what is the field for the string "(Request..." after the ErrorCode and ErrorType?:

Code: Select all

0x80042013 KOFF_E_REQUEST_FAILED (Request info: Batch move message store=aberger@eanw.nexter 
Waiting for your answers to post a new version of the Parser handling all these cases :)
Thanks,
Xavier
rodb
Posts: 9
Joined: Tue Mar 10, 2020 12:46 am

Re: Help with multi-line regex parser

Post by rodb »

Code: Select all

Task 9032 handler BEGIN
This one is for the message field.

And for the string

Code: Select all

(Request..." after the ErrorCode and ErrorType?:
That needs to be appended to the message (if there is already data in it).

thanks again!

Rod
admin
Site Admin
Posts: 556
Joined: Sun Dec 17, 2006 10:30 pm

Re: Help with multi-line regex parser

Post by admin »

OK thanks, but what about the "Process ID" now surrounded by square brackets [] instead of parentheses () ? Was it a typo or both are possible?
rodb
Posts: 9
Joined: Tue Mar 10, 2020 12:46 am

Re: Help with multi-line regex parser

Post by rodb »

sorry yes that is a typo. Process ID is usually surrounded by square brackets.
admin
Site Admin
Posts: 556
Joined: Sun Dec 17, 2006 10:30 pm

Re: Help with multi-line regex parser

Post by admin »

Hello,

Here is a new version of the Parser handling these cases.
kerio-connect-parser.export
(5.96 KiB) Downloaded 1856 times
Let me know if you have any questions/problems!

Xavier
Post Reply