3

Python data pre-processing

Hi Klipfolio,

 

Thanks for the great piece of software!

Klipfolio is great for vizualization. But preparing data for is it PAIN. Takes unreasonable amount of time.

I would suggest to allow users to use Python to pre-process data. For security reasons you might allow only numpy and pandas modules and run Jupyter notebooks for debugging.

I will be possibly the best dataviz service ever.

Otherwise, if you have other ideas how/where to host the python pre-processing service, let me know. It would be a massive upgrade to Klipfolio.

 

Best regards,

Dmitrii

3 comments

  • 0
    Avatar
    Scott Lawrence

    Hi Dmitri,

    Thank you for sharing your ideas here!  Can you share more about the kinds of pre-processing you are thinking about?  I'd like to better understand your use case and what you are trying to achieve.

    Many customers are already doing pre-processing of their data outside of Klipfolio and then pushing data to Klipfolio using our API.  Have you already explored this option?

    Thanks in advance for your inputs and for sharing the context of your use case.

    Cheers,

    Scott.

     

  • 0
    Avatar
    Dmitrii Beliakov

    Hi Scott,

    Thank you for your reply! Great that you are actually listening to customers. I appreciate it.

    The tools that I am missing - DataFrames operations in python using pandas module - possible the best data crunching module on the planet. Without pre-processing we are very limited to functions of Klipfolio, which often results in hundreds characters formulas being copies in columns. And if you need to modify it, you have to do this all over again.

    Typically this is about creating a subset of data from the data source, some aggregation functions etc. Klipfolio is really good a visualisation, but preprocessing is something that is really missing.

    I imagine if you could just give us an option to manipulate the source data before feeding it into Klipfolio clips, it would greatly improve efficiency and speed of development. It can be done in Python, R or any other language that you will consider safe to run.

     

    Yes, I was considering building our own pre-processing. However, it requires much efforts of building own REST API, hosting it, etc. It's is doable, it's not a rocket science, but this is something that almost any Klipfolio user would benefit from.

    Our current solution is to build a small API using Python Flask and host it on Pythonanywhere. But again, wouldn't it be great to share such capabilities with everyone at Klipfolio? Wouldn't it be easier for you to write a small optional pre-processing module, where we simply upload Python code, which outputs, let's say CSV?

    Best regards,

    Dmitrii

  • 0
    Avatar
    Dmitrii Beliakov

    Just to add.

    To build such API we need to connect to external sources just like Klipfolio does already. We have to run cron tasks just like you relay do.

    But I will definitely look into the API option today. I didn't know it can receive data.

Please sign in to leave a comment.