Fraudulent Activity Notifications
Problem statement
HackerLand National Bank has a simple policy for warning clients about possible fraudulent account activity. If the amount spent by a client on a particular day is greater than or equal to 2x the client’s median spending for a trailing number of days, they send the client a notification about potential fraud. The bank doesn’t send the client any notifications until they have at least that trailing number of prior days’ transaction data.
The Median
The median of a trailing of numbers is the middle number when all the numbers are sorted in ascending order. If there is an even number of numbers, the median is the average of the two middle numbers.
The Algorithm
- Loop over the expenditure array
- Get trailing days from expenditure array
- Since, The bank doesn’t send the client any notifications until there is enough data. Stop the looping and return the current notifications count.
- But if there is enough data, we sort the array in order to find the median.
- find the median (we’ll see three different flavors)
- And finally, If the amount spent by a client on a particular day is greater than or equal to the client’s median spending. Increase the notifications counter
But this algorithm will not suffice the requirements on performance.
The naive algorithm O(n log n)
This algorithm simply sorts the list of numbers in ascending order and then returns the middle number. This algorithm is simple to implement, but it is not very efficient, especially for large lists of numbers.
This problem can certainly be solved using a sorting algorithm to sort a list of numbers and return the value at the i th index. However, many sorting algorithms can’t go faster than n log n time. A median-finding algorithm can find the i th smallest element in a list in O(n) time.
The quickselect algorithm
This algorithm is a more efficient algorithm for finding the median of a list of numbers. The quickselect algorithm works by repeatedly partitioning the list around a pivot element. The pivot element is chosen so that it is greater than half of the elements in the list and less than half of the elements in the list. This partitioning process is repeated until the pivot element is the median of the list.
Quickselect is a selection algorithm to find the k'th
smallest element in an unordered list. It is closely related to the Quicksort sorting algorithm. Like Quicksort, it is efficient traditionally and offers good average-case performance, but has a poor worst-case performance.
The median-of-medians algorithm O(n)
.
This algorithm is a more efficient algorithm for finding the median of a list of numbers than the quickselect algorithm. The median-of-medians algorithm works by first finding the medians of all the sublists of the original list. The median of the medians is then used as the pivot element for the quickselect algorithm. This algorithm is more efficient than the quickselect algorithm because it does not need to sort the entire list of numbers.
Why 5?
The median-of-medians divides a list into sublists of length five to get an optimal running time. Remember, finding the median of small lists by brute force (sorting) takes a small amount of time, so the length of the sublists must be fairly small. However, adjusting the sublist size to three, for example, does change the running time for the worse.