Video: Usos del Machine Learning aplicados al E-commerce

Aquí dejo el video de mi charla "Usos del Machine Learning aplicados al E-commerce" que tuvo lugar en la ENAE Business School como parte del Foro "Ecommerce & Big/Small Data". En esta charla explico varios algoritmos que se usan hoy en día en Ecommerce así como las librerías que…

Handle missing categoricals with PMML

PMML, a markup language developed by the Data Mining Group is, in my opinion, a well needed standard in the Data Science ecosystem. PMML is basically an xml format to define Machine learning pipelines, which allows for (sort of) interoperability between different ML Platforms. In particular, I have been working…

Video: Jornadas Data Science en Murcia

El 21 de Abril de 2017, y gracias al apoyo de Centic y del Info de Murcia, unas 80 personas se acercaron a que yo les diera la brasa durante 3 horas sobre todo lo relacionado con Data Science. Aquí dejo el video. Las transparencias las podeis ver en SlideShare.…

This is what a memory leak looks like

Left, side of this chart, VSZ (virtual memory) and RSS (RAM) over time (obtained via ps) for a process using poor implementation of KafkaClient in java, which is creating a new kafka client per GET request. This is bad. Right side of the chart, current performance once I fixed the…

Note to self: Changing loglevel in apache Spark

Very quick note for future reference. Please ignore. Change loglevel in spark Easy peasy, you can do it programatically in the application like: spark.sparkContext.setLogLevel("WARN") Change loglevel in yarn This one took a while to find, you can just run spark-submit while previously exporting this envvar: export YARN_…