Frequency of occurrence analysis in LibreOffice

Tuesday, November 19, 2019 

One fairly common analytic technique is finding out, for example, the rate at which something appears in a time referenced file, for example a log file.

Lets say you’re looking for the rate of some reported failure to determine, say, whether a modification or update had made it better or worse.  There are log analysis tools to do this (like Splunk), but one way to do it is with a spreadsheet.

Assuming you have a table with time in a column (say A) and some event text in another (say B) like:

11-19 16:51:03 a bad error happened
11-19 16:51:01 something minor happened
11-19 12:51:01 cat ran by

you might convert that event text to a numerical value (into, say, column C) for example by:

=IF(ISERROR(FIND("error",B1)),"",1)

FIND returns true if the text (“error”) is found, but returns an error if not. ISERROR inverts that and returns logical values for both. The IF ISERROR construction allows one to specify values if the text is found or not – a bit complex but the result in this case will be “” (blank) if “error” isn’t found in B1 and 1 if “error” is found.

Great, fill down and you have a new column C with blank or 1 depending if “error” was found in column B. Summing column C yields the total number of lines in which the substring “error” occurred.

But now we might want to make a histogram with a sum count of the occurrences within a specific time period, say “errors/hour”.

Create a new column, say column D, and fill the first two rows with date/time values separated by the sampling period (say an hour), for example:

11/19/2019 17:00:00
11/19/2019 16:00:00

And fill down; there’s a quirk where LibreOffice occasionally loses one second, which looks bad. It probably won’t meaningfully change the results, but just edit the first error, then continue filling down.

To sum within the sample period use COUNTIFS in (say) column E to count the occurrences in entire columns that meet a string of criterion: in this case three criterion have to be met: the value of C is 1 (not “”), The value of A (time) is before the start of the sampling period (D1) and after the end (D2). That is:

=COUNTIFS(C1:C500,"1",$A1:$A500,">="&$D2,$A1:$A500,"<"&$D1)

Filling this formula down populates the sampling periods with the count per sampling period, effectively the occurrence rate per period, in our example errors/hour.

Summary
Author Rating
1star1star1star1star1star
5 based on 1 votes
Software Name
LibreOffice Calc
Operating System
Linux Mint
Software Category
Office
Price
USD 0
Landing Page
Posted at 08:36:41 GMT-0700

Leave a Reply

162 Views