0

Getting around "Too many data points to plot"

Just encountered this for my first time. I didn't really have anything useful I could aggregate or group by, as the intention of this chart was to show as much data as possible. So, I created something which would group them together only when necessary, keeping us below the 1,000 data point limit at all times.

Create some hidden data, and use the following:

ROUND(RANK(A:A)/CEILING(COUNT(A:A)/1000),0)

Group on this hidden data.

where A:A contains unique identifiers for your data.Now, the number of data point will be 1:1 up to 1,000, at which point they are grouped in pairs, then triplets etc, maintaining the maximum amount of visible data all the way.

11 comments

  • 0
    Avatar
    Janice Janczyn

    Hi Matthew,

    Thank you for posting in our community!  Would you be able to provide a few more details on your logic?  In particular I'd like to understand the purpose of the RANK function in this formula.

    Thank you,
                  Janice

     

     

  • 0
    Avatar
    Matthew Parker

    Hi Janice,

    So I first encountered the limit where data I was charting hit 1,0004 entries, exceeding the limit.

    The Rank(A:A) part assigns a rank to each of the ID fields (assuming ID is in column A). So the first item will be 1, then 2, etc. we then divide this number by the total number of elements we are charting against, divided by the maximum limit (1000).

     

    So, in my example, we are taking 1,004, dividing by the limit of 1,000 to get 1.004. This is wrapped in Ceiling(), so it is rounded up to 2. We are then dividing the unique rank given by 2, and then rounding this to the nearest integer. What this does, is it creates 2 equal values, so there will be 2 1's, 2 2's, 2 3's etc. This allows us to group on this, and it will keep the limit beneath 1,000 whilst maintaining the highest amount of precision.

     

    If the number of data points is <= 1,000, then we are dividing the rank by 1, so it won't group anything together at all. If the count lies between 1001 and 2000, we are dividing the rank by 2, which will mean the number of data points will be halved, which keeps it under 1000. if its between 2001 and 3000, we are dividing it by 3, so we will have between 667 and 1000 data points, etc. it basically ends up grouping stuff together more and more, to keep everything below 1000 total data points.

  • 0
    Avatar
    Janice Janczyn

    Hi Matthew,

    I see, your IDs in column A:A are text, correct? With this approach, the groupings are somewhat arbitrary as the number of items reach the incremental thousands thresholds.  Note that if A:A has <= 1000 items and there are duplicate values, these duplicates will be assigned the same rank and grouping these ranks will return fewer items than contained in A:A. Is this your intention?

    Thanks,
             Janice

  • 0
    Avatar
    Matthew Parker

    Hi,

    It may work better in my situation than some others, but yes my ID's are text. Each value is unique.

    The first part of them however is based from a time code they were generated, which relates to the order I want to show them in anyway. so the order these are displayed isn't interuptted in my use case, but may be depending on how ID's are generated. I couldn't see a way of doing the same without that dependency however.

  • 0
    Avatar
    Janice Janczyn

    Hi Matthew,

    Thank you for explaining. You've come up with a creative solution to this situation. A more generic approach that would work with both unique and non-unique IDs would be to repeat the values 1 - ROUND( n / 1000 ) (where n is the number of items in A:A) enough times until you have the same number of items as in A:A, as follows:

         REPEAT( CUMULATIVE( REPEAT (1, ROUND( COUNTALL( A:A ) / CEILING( COUNTALL( A:A ) / 1000 ) ) ) ),
                         CEILING( COUNTALL( A:A ) / 1000 ) )

    Thanks again for posting in our community!

    Janice

  • 0
    Avatar
    Matthew Parker

    Hi Janice,

    It may be me misinterpretting, but I just gave that a try and it seemed to be repeating the sequence 1..x, y times, which would be grouping things together which are far apart?

    My data currently has 1064 rows, so it is attempting to group htem into  532 groups of 2 values, which it does. However, the overall result of that formula is "1,2,3,4,5...,532,1,2,3,4,5...,532". This means that the value at the start is grouped together with the value half way through, whereas what I would need is the first value to be grouped together with the second value, so I would be looking for that to instead create an array as such: "1,1,2,2,3,3,4,4,5,5...,532,532".

  • 0
    Avatar
    Janice Janczyn

    That's correct. It depends on how you want to group your data. To group items closer together, wrap SORT around the formula:

     

         SORT( REPEAT( CUMULATIVE( REPEAT (1, ROUND( COUNTALL( A:A ) / CEILING( COUNTALL( A:A ) / 1000 ) ) ) ), 
                                    CEILING( COUNTALL( A:A ) / 1000 ) ),
                     ascending numeric )

     

    Thanks,
              Janice

  • 1
    Avatar
    Team JAS

    Hi there,

    I believe this post can answer my issue, but when trying to use the formulas on here, it endlessly loads.
    My client (setting them up as a pilot client) involves a lot of data points, and we are trying to make a dashboard that shows near-real-time, but also can show the past year of data (aggregated). We take a data point every 100 seconds, so as you can guess more than a day out the graph will not render.

    We have a little less than a years worth of data for them, but still it's every 100 seconds. 

    How can I have my data automatically aggregate depending on user-inputted date range? I want the default to be last 24 hours (will plot each data point), then options for a week, by month, quarter, and year. 

    Please let me know if more explanation is needed, but I believe this post is close to helping me out. Thank you!!

  • 0
    Avatar
    Janice Janczyn

    Hi there,

    One solution would be to group according to the number of days selected such that there are always 864 datapoints.

         24 hours * 60 minutes/hour * 60 seconds/minute = 86,400 seconds / 100 datapoints/second = 86,400 seconds
         86,400 seconds * 1 datapoint / 100 seconds = 864 datapoints

    Add hidden data with the formula to repeat the values 1 through 864 for every day in the range selected and then sort them in ascending order (this example assumes date selectors set the $startDate and $endDate variables as Unix datetimes):

         SORT( REPEAT( CUMULATIVE(REPEAT(1,864)), COUNTDAYS($startDate, $endDate) ), ascending numeric )

    This will return, for example, 1,1,2,2,3,3...864,864 for 2 days of data. Next, set the Group action on this hidden data (this will group all the 1s, 2s, 3s...864s) adn and on the series, set Aggregation > Sum to sum the values according to these groupings.

    Thank you,
                Janice

  • 0
    Avatar
    Team JAS

    Hi Janice,

    I tried implementing the formula:

        SORT( REPEAT( CUMULATIVE(REPEAT(1,864)), COUNTDAYS($startDate, $endDate) ), ascending numeric )

    But didn't have any luck. I also tried changing the $startDate and $endDate to @B:B (the respective column of Unix time points that I want to sort/group the data by)

    I couldn't find the hidden data reference either, so maybe that is my culprit. 

    Any more help is appreciated, thanks,

    Cole

  • 0
    Avatar
    Janice Janczyn

    Hi Cole,

    The formula I gave would go in the hidden data, not in a series. Go to the Properties tab for your Bar/Line Chart and you'll see a button to click to Add Hidden Data. Your series still points to your data while the hidden data is used to do the grouping.

    If you still have questions, please email support@klipfolio.com with more details about your data and your klip.

    Thank you,
                 Janice

Please sign in to leave a comment.