0

Working with large csv datasets

Hi,

I have a general Klipfolio performance question. A client have several data sources, including a handful of csv files on a ftp server. Each file contain 100.000+ rows, and they have id's to link information together (just like a relational database). Data will only be refreshed once a day, but my concern is about interface performance. How will lookup, group, groupby, select and other formulas behave?

I don't have any experience working with such lage data set, and looking for some general advices and best practices. Dataset is within the 10mb limit (please increase....)

 

best regards,

Jorn

 

 

1 comment

  • Avatar
    Meggan King Official comment

    Hi Jorn -

    Great question and sorry for the delay!

    It really depends on the data itself. Simple numbers might be seamless, long text strings can be slow, depending on how you are manipulating the data. There can also be a high number of rows, but everything work really quickly and smoothly. 

    The more the data is aggregated outside of Klipfolio, the better. In terms of specific formulas which can be slower - MAP(), DATASOURCE() and SORT() with a very large number of rows. Using hidden tables can cause perceived performance issues, but the reality is, REF runs the entire formula every time it is called, so there is little benefit to having hidden tables and columns in the end design (I realize it can be really useful to have hidden tables during the initial construction of klips).

    If you run into issues once you start working with the customers data, let us know and we can offer more specific help.

    Thanks,

    Meggan

     

     

     

     

     

Please sign in to leave a comment.